open-mmlab / mmskeleton

A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Apache License 2.0
2.93k stars 1.04k forks source link

How to fix "RuntimeError: CUDA out of memory." #263

Open fatsoengineer opened 4 years ago

fatsoengineer commented 4 years ago

Load configuration information from configs/recognition/st_gcn_aaai18/ntu-rgbd-xsub/test.yaml Downloading: "https://open-mmlab.s3.ap-northeast-2.amazonaws.com/mmskeleton/models/st-gcn/st_gcn.ntu-xsub-300b57d4.pth" to /root/.cache/torch/checkpoints/st_gcn.ntu-xsub-300b57d4.pth 100% 11.9M/11.9M [00:02<00:00, 4.69MB/s] terminal width is too small (0), please consider widen the terminal for better progressbar visualization [ ] 0/16487, elapsed: 0s, ETA:Traceback (most recent call last): File "mmskl.py", line 121, in main() File "mmskl.py", line 115, in main call_obj(cfg.processor_cfg) File "/content/drive/My Drive/Backup/Graphs/pose_detection/mmskeleton/mmskeleton/utils/importer.py", line 24, in call_obj return import_obj(type)(kwargs) File "/content/drive/My Drive/Backup/Graphs/pose_detection/mmskeleton/mmskeleton/processor/recognition.py", line 33, in test output = model(data).data.cpu().numpy() File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], *kwargs[0]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/content/drive/My Drive/Backup/Graphs/pose_detection/mmskeleton/mmskeleton/models/backbones/st_gcnaaai18.py", line 90, in forward x, = gcn(x, self.A importance) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, *kwargs) File "/content/drive/My Drive/Backup/Graphs/pose_detection/mmskeleton/mmskeleton/models/backbones/st_gcn_aaai18.py", line 203, in forward x, A = self.gcn(x, A) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, **kwargs) File "/content/drive/My Drive/Backup/Graphs/pose_detection/mmskeleton/mmskeleton/ops/st_gcn/gconv_origin.py", line 63, in forward x = torch.einsum('nkctv,kvw->nctw', (x, A)) File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 201, in einsum return torch._C._VariableFunctions.einsum(equation, operands) RuntimeError: CUDA out of memory. Tried to allocate 5.49 GiB (GPU 0; 11.17 GiB total capacity; 7.38 GiB already allocated; 2.53 GiB free; 953.11 MiB cached)

mejdidallel commented 4 years ago

Your GPU is running out of memory. You can try these options : 1- Use torch.cuda.empty_cache() to clear memory cache 2- Kill process using GPU. (Use nvidia-smi to find out the process' name). If it's python use sudo pkill python to kill it. 3- You are using large batches. Try reducing the batch size.

luoluo-gif commented 4 years ago

@mejdidallel Where to put the "torch. CUDA. Empty cache()"? I met the same question,but I don't know where it can be put. Can you give me some advice,thank you.

mejdidallel commented 4 years ago

@mejdidallel Where to put the "torch. CUDA. Empty cache()"? I met the same question,but I don't know where it can be put. Can you give me some advice,thank you.

Just put it in the file you are running while getting the error to clear cache everytime you run it. Or you can simply open Python terminal and type import torch then torch.cuda.empty_cache().