Hi Yassine,
I have read your project named “pytorch-segmentation” and am so interested in it. The project codes have been checked out to my local computer, after that I downloaded the data set that is called “VOCtrainval_11-May-2012”.
However, when I execute the training command “python train.py --config config.json”, the following errors have been encountered:
TRAIN (1) | Loss: 4.992 | Acc 0.01 mIoU 0.00 | B 6.18 D 0.73 |: 0%| | 1/1323 [00:06<2:16:14, 6.18s/it]Traceback (most recent call last):
File "train.py", line 61, in
main(config, args.resume)
File "train.py", line 42, in main
trainer.train()
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/base/base_trainer.py", line 101, in train
results = self._train_epoch(epoch)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/trainer.py", line 57, in _train_epoch
output = self.model(data)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], *kwargs[0])
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 85, in forward
output = self.master_branch(x)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 37, in forward
output = self.bottleneck(torch.cat(pyramids, dim=1))
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)
Besides, I have added “torch.cuda.empty_cache()” before each iteration in the python file “trainer.py”.
Unlucky, the above errors still exists.
“RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)”
So, would you pls help to give me some suggestions on how to resolve the above issues? Thanks?
Hi Yassine, I have read your project named “pytorch-segmentation” and am so interested in it. The project codes have been checked out to my local computer, after that I downloaded the data set that is called “VOCtrainval_11-May-2012”. However, when I execute the training command “python train.py --config config.json”, the following errors have been encountered: TRAIN (1) | Loss: 4.992 | Acc 0.01 mIoU 0.00 | B 6.18 D 0.73 |: 0%| | 1/1323 [00:06<2:16:14, 6.18s/it]Traceback (most recent call last): File "train.py", line 61, in
main(config, args.resume)
File "train.py", line 42, in main
trainer.train()
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/base/base_trainer.py", line 101, in train
results = self._train_epoch(epoch)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/trainer.py", line 57, in _train_epoch
output = self.model(data)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
return self.module(*inputs[0], *kwargs[0])
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 85, in forward
output = self.master_branch(x)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, *kwargs)
File "/home/jennifer/Documents/Python_projects/pytorch-segmentation/models/pspnet.py", line 37, in forward
output = self.bottleneck(torch.cat(pyramids, dim=1))
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(input, kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 443, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/anaconda3/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 439, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)
“RuntimeError: CUDA out of memory. Tried to allocate 398.00 MiB (GPU 0; 7.80 GiB total capacity; 5.77 GiB already allocated; 397.56 MiB free; 5.87 GiB reserved in total by PyTorch)” So, would you pls help to give me some suggestions on how to resolve the above issues? Thanks?