Cuda out of memory #30

Closed rhljajodia closed 3 years ago

rhljajodia commented 3 years ago

Hello, i get this error while trying to train:

python run/pose2d/ --cfg experiments-local/mixed/resnet50/320_fusion.yaml => creating output/mixed/multiview_pose_resnet_50/320_fusion
=> creating log/mixed/multiview_pose_resnet_50/320_fusion2020-10-04-20-04
[Configuration details truncated for brevity - model architecture output] " Traceback (most recent call last): File "run/pose2d/", line 189, in main() File "run/pose2d/", line 160, in main train(config, train_loader, model, criterion, optimizer, epoch, File "/home/rjajodia/crossview/run/pose2d/../../lib/core/", line 78, in train optim.step() File "/home/rjajodia/.conda/envs/crossview/lib/python3.8/site-packages/torch/optim/", line 67, in wrapper return wrapped(*args, *kwargs) File "/home/rjajodia/.conda/envs/crossview/lib/python3.8/site-packages/torch/autograd/", line 15, in decorate_context return func(args, **kwargs) File "/home/rjajodia/.conda/envs/crossview/lib/python3.8/site-packages/torch/optim/", line 107, in step denom = (exp_avg_sq.sqrt() / math.sqrt(biascorrection2)).add(group['eps']) RuntimeError: CUDA out of memory. Tried to allocate 158.00 MiB (GPU 0; 7.80 GiB total capacity; 6.48 GiB already allocated; 100.19 MiB free; 6.66 GiB reserved in total by PyTorch)

Hello. I got this error while trying to train. Any fixes for this?

rhljajodia commented 3 years ago

Hello. I managed to solve this. There was a memory leak in adam optimizer of pytorch.

KungZell commented 2 years ago

Hello. I managed to solve this. There was a memory leak in adam optimizer of pytorch.

Hello, I have met the same problem, could you give a more detailed solution ,thank you.

rhljajodia commented 2 years ago

Hello. I managed to solve this. There was a memory leak in adam optimizer of pytorch.

Hello, I have met the same problem, could you give a more detailed solution ,thank you.

Hello. There was a memory leak in the Adam optimizer of pytorch package (I don't remember version) I had installed for python 3.8. It was not clearing memory as it should have and thus eating more and more memory with every call. I remember I had solved it by plugging in some code in the optimizer. I suggest you debug by stepping into the specific line where Adam optimizer is being called (line 160 In, or try a different version of pytorch. Unfortunately I have gotten rid of the source files since, so I cannot help you with the code I used. all things said, it was not caused because of the code used in this git, it was a pytorch problem, atleast in my case.