pclucas14 / pixel-cnn-pp

Pytorch Implementation of OpenAI's PixelCNN++
Other
345 stars 76 forks source link

Out of memory when running main.py #6

Closed SaizhuoWang closed 5 years ago

SaizhuoWang commented 5 years ago

Hi there, thanks for your work. I am running main.py on one 1080Ti GPU with memory of 11172MB. And the parameters are all set by default. It seems that the PixelCNN++ model has consumed all the memory and I met this error:

Traceback (most recent call last):
  File "main.py", line 130, in <module>
    output = model(input)
  File "/home/nesa320/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nesa320/wsz_3160105035/pcpp-pytorch/model.py", line 139, in forward
    u, ul = self.down_layers[i](u, ul, u_list, ul_list)
  File "/home/nesa320/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nesa320/wsz_3160105035/pcpp-pytorch/model.py", line 53, in forward
    ul = self.ul_stream[i](ul, a=torch.cat((u, ul_list.pop()), 1))
  File "/home/nesa320/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/nesa320/wsz_3160105035/pcpp-pytorch/layers.py", line 137, in forward
    x = self.conv_input(self.nonlinearity(og_x))
  File "/home/nesa320/wsz_3160105035/pcpp-pytorch/model.py", line 63, in <lambda>
    self.resnet_nonlinearity = lambda x : concat_elu(x)
  File "/home/nesa320/wsz_3160105035/pcpp-pytorch/utils.py", line 14, in concat_elu
    return F.elu(torch.cat([x, -x], dim=axis))
RuntimeError: CUDA error: out of memory

I used pdb to trace the program and it seems that a u = self.u_stream[i](u, a=u_list.pop()) operation takes about 500MB of memory. And the program ran out of memory after executing u_out, ul_out = self.up_layers[i](u_list[-1], ul_list[-1]) twice, each execution taking about 6000MB of memory. Can you help me with this? I don't know if it is normal with the default parameter setup.

vsub21 commented 5 years ago

Did you find a solution to this? I am running into the same issue, using a GTX 1080 TI.

SaizhuoWang commented 5 years ago

@vsub21 Well, it seems that the structure of the network and the batch size have something to do with this issue. I reduced the memory consumption by reducing the complexity of the network(lowering "nr_resnet" and "nr_filters" parameters) and reducing the batch size.

pclucas14 commented 5 years ago

Hi,

yes the code is very memory intensive. It is possible that some refactoring could help with this issue.