There were issues regarding the high memory usage. I couldn't run the demo myself due to CUDA out-of-memory errors.
The solution is to avoid computing gradients at the forward step. As it was before, the gradients accumulate at each iteration since they are not used. This results in linearly increasing memory usage to store unnecessary tensors.
There were issues regarding the high memory usage. I couldn't run the demo myself due to CUDA out-of-memory errors.
The solution is to avoid computing gradients at the forward step. As it was before, the gradients accumulate at each iteration since they are not used. This results in linearly increasing memory usage to store unnecessary tensors.