isht7 / pytorch-deeplab-resnet

DeepLab resnet v2 model in pytorch
MIT License
602 stars 118 forks source link

error: out of memory #2

Closed yghlc closed 7 years ago

yghlc commented 7 years ago

Dear all, I tried to train the model with VOC2012, but had error: out of memory. Following is the output message,

train.py --lr 0.00025 --wtDecay 0.0005 --gpu0 0 --maxIter 20000 --GTpath /home/hlc/Data/VOCdevkit/VOC2012/SegmentationClassAug --IMpath /home/hlc/Data/VOCdevkit/VOC2012/JPEGImages --LISTpath data/list/train_aug.txt
{'--GTpath': '/home/hlc/Data/VOCdevkit/VOC2012/SegmentationClassAug',
 '--IMpath': '/home/hlc/Data/VOCdevkit/VOC2012/JPEGImages',
 '--LISTpath': 'data/list/train_aug.txt',
 '--gpu0': '0',
 '--help': False,
 '--iterSize': '10',
 '--lr': '0.00025',
 '--maxIter': '20000',
 '--wtDecay': '0.0005'}
('iter = ', 0, 'of', 20000, 'completed, loss = ', array([ 2.40648198], dtype=float32))
('(poly lr policy) learning rate', 0.00025)
('iter = ', 1, 'of', 20000, 'completed, loss = ', array([ 1.26656163], dtype=float32))
('iter = ', 2, 'of', 20000, 'completed, loss = ', array([ 0.74460578], dtype=float32))
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/train.py", line 229, in <module>
    out = model(images)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 201, in forward
    out.append(self.Scale3(x3)) # for 0.5x scale
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 178, in forward
    x = self.layer3(x)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hlc/codes/PycharmProjects/pytorch-deeplab-resnet/deeplab_resnet.py", line 89, in forward
    out = self.conv2(out)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py", line 40, in conv2d
    return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

Process finished with exit code 1
`

The memory of my GPU is 8 GB, and the batchsize used in the code is 1. Any idea to this error? Thanks for your help.

isht7 commented 7 years ago

This code was tested on a Nvidia Titan X GPU, where it was occupying about 11.9 GB of memory. Therefore, it would not run on a 8 GB GPU. You can use this .pth file which I trained myself on the train set of VOC2012. If you disable the scale augmentation in train.py then I think you should be able to accommodate it in a 8 GB GPU. You will have to modify some parts of train.py to remove the scale augmentation.