Open fabsen opened 3 years ago
Oh. This is interesting. Probably the figures are not probably closed. Thanks for pointing out. I wonder if this is an issue related to the pytorch lightning TensorboardLogger.
This still seems to be an issue. I had been training in a Docker container and thus not seeing the plots.
After training completed when not using a container, my system would almost crash from the sheer number of figures being opened. I will take a look at fixing the plot generation issue.
@fabsen Thank you for the solution of log_interval=-1
. I faced the same issue while training in ddp mode on 4x NVidia V100. This was a major hurdle for scalability. Libraries I'm using:
pytorch-forecasting==0.9.0 pytorch-lightning==1.6.5 torch==1.11.0 torchmetrics==0.5.0
I experienced the same issue today after trying to upgrade my environment after a while. I was already using log_interval = -1
I am not using DDP mode either. I do have a multiple GPU setup but am only using 1 GPU at a time using os.environ["CUDA_VISIBLE_DEVICES"]
library versions are:
pytorch-forecasting==1.0.0 pytorch-lightning==2.1.1 torch==2.0.1 torchmetrics==1.2.0
I experienced the same issue today after trying to upgrade my environment after a while. I was already using
log_interval = -1
I am not using DDP mode either. I do have a multiple GPU setup but am only using 1 GPU at a time using
os.environ["CUDA_VISIBLE_DEVICES"]
library versions are:
pytorch-forecasting==1.0.0 pytorch-lightning==2.1.1 torch==2.0.1 torchmetrics==1.2.0
I am facing with same problem. Tried log_interval=-1 but did not make any difference. Did you able to solve it?
Expected behavior
I follow the tft tutorial but want to train on multiple GPUs.
Actual behavior
RAM usage increase drastically over time until we get a memory error (Cannot allocate memory ...)
Changing to
log_interval=-1
gets rid of the problem. Also training on one GPU only doesn't increase RAM usage.Code to reproduce the problem
Steps that differ from the tutorial:
gpus=[0, 1], accelerator='ddp',
/edit: For clarification: RAM usage keeps increasing, not VRAM (which is okay).