yassouali / pytorch-segmentation

:art: Semantic segmentation models, datasets and losses implemented in PyTorch.
MIT License
1.64k stars 379 forks source link

How to resolve the "RuntimeError: CUDA out of memory" #123

Closed JenniferYingyiWu2020 closed 2 years ago

JenniferYingyiWu2020 commented 3 years ago

Hi Yassine, I have read your project named “pytorch-segmentation” and am so interested in it. The project codes have been checked out to my local computer, after that I downloaded the data set that is called “VOCtrainval_11-May-2012”. However, when I execute the training command “python train.py --config config.json”, the following errors have been encountered: TRAIN (1) | Loss: 4.992 | Acc 0.01 mIoU 0.00 | B 6.18 D 0.73 |: 0%| | 1/1323 [00:06<2:16:14, 6.18s/it]Traceback (most recent call last): File "train.py", line 61, in main(config, args.resume) File "train.py", line 42, in main trainer.train() File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/base/base_trainer.py", line 101, in train results = self._train_epoch(epoch) File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/trainer.py", line 57, in _train_epoch output = self.model(data) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], *kwargs[0]) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 85, in forward output = self.master_branch(x) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, *kwargs) File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 37, in forward output = self.bottleneck(torch.cat(pyramids, dim=1)) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(input, kwargs) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward return self._conv_forward(input, self.weight, self.bias) File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward return F.conv2d(input, weight, bias, self.stride, RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)

    Besides, I have added “torch.cuda.empty_cache()” before each iteration in the python file “trainer.py”.

100

    Unlucky, the above errors still exists. 

“RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)” So, would you pls help to give me some suggestions on how to resolve the above issues? Thanks?

yassouali commented 3 years ago

HI @JenniferYingyiWu2020

I think the easiest way to solve this is to reduce the batch size in the config file, or you can also reduce the image size