Open MaximilianPi opened 3 months ago
Hi @MaximilianPi ,
I believe this is expected unfortunatelly. When building autoregressive models, since you have tensors that requires_grad
in the computation, torch is storing the full computation graph in order to be able to (at some point) compute derivative of A with respect to Parameter
. It's probably growing in memory usage in a exponential manner.
The problem might be more visible on GPU, because at some we try to call R's GC at every iteration trying to free some more memory. You can read more about how to tune this here: https://torch.mlverse.org/docs/articles/memory-management#cuda
Can you post how you are training your model? A common source of this is issue is that you actually need to call A$detach()
at some point to avoid holding the full graph of computations.
Hi @dfalbel,
I am implementing an autoregressive model and I need to do it in a for loop, but I have encountered a problem when running the model on the GPU (and CPU). For large data, there is a threshold (at time i in the loop) where the runtime suddenly increases many times and the memory starts to run full. Here is a minimal example (reproducing the example may depend on the data and the GPU):
plot(res, xlab = "epochs", ylab = "runtime" )
Any ideas what might be happening? (The problem (memory leakage and drop in runtime) occurs also on the CPU, but not as severely)
GPU: NVIDIA A5000 Cuda: 11.7