EDSR out of memory at test time

muneebaadil commented 6 years ago

I get out of memory error (in 12GB GPU RAM) when running final model of EDSR.

THCudaCheck FAIL file=/home/sibt/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "main.py", line 14, in <module>
    while not t.terminate():
  File "/home/sibt/muneeb/_superResolution/code/trainer.py", line 164, in terminate
    self.test()
  File "/home/sibt/muneeb/_superResolution/code/trainer.py", line 98, in test
    output = _test_forward(input, scale)
  File "/home/sibt/muneeb/_superResolution/code/trainer.py", line 87, in _test_forward
    self.args.chop_shave, self.args.chop_size)
  File "/home/sibt/muneeb/_superResolution/code/utils.py", line 240, in chop_forward
    output_batch = model(input_batch)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sibt/muneeb/_superResolution/code/model/EDSR.py", line 49, in forward
    x = self.tail(res)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 89, in forward
    input = module(input)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 89, in forward
    input = module(input)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/modules/pixelshuffle.py", line 40, in forward
    return F.pixel_shuffle(input, self.upscale_factor)
  File "/datadrive/anaconda2/lib/python2.7/site-packages/torch/nn/functional.py", line 1688, in pixel_shuffle
    shuffle_out = input_view.permute(0, 1, 4, 2, 5, 3).contiguous()
RuntimeError: cuda runtime error (2) : out of memory at /home/sibt/pytorch/aten/src/THC/generic/THCStorage.cu:58

Python command is attached below for your reference

python main.py --dir_data /datadrive --scale 4 --n_train 790 --n_val 10 --offset_val 790 --print_model --model EDSR --n_feats 256 --n_resblocks 32 --patch_size 96 --chop_forward --test_only

sanghyun-son commented 6 years ago

Hello.

You can change args.chop_size to handle this issue.

Please try with smaller chop_size.

Thank you.

muneebaadil commented 6 years ago

I couldn't find any option for chop_size in command line arguments.

However, for quick testing, I did change the function prototype for smaller chop_size, like so:

def chop_forward(x, model, scale, shave=10, min_size=2000, nGPUs=1):

But it is still giving the same error.

sanghyun-son commented 6 years ago

I tested your script with 1080Ti (11GB RAM) without changing chop_size option, but I did not get any error. I also checked that the process consumes < 5000MB RAM. Maybe there are some other issues that additionally consume your RAM.

muneebaadil commented 6 years ago

Checked with nvidia-smiand there's no other process that seems to be taking GPU memory. Furthermore, during training, EDSR takes around 6000MB RAM so I guess that's fine. Not sure what might be causing memory error during test time.

Could you kindly paste your script command here?

sanghyun-son commented 6 years ago

I used the same script you uploaded:

python main.py --dir_data /datadrive --scale 4 --n_train 790 --n_val 10 --offset_val 790 --print_model --model EDSR --n_feats 256 --n_resblocks 32 --patch_size 96 --chop_forward --test_only

Unfortunately, I am not sure what the exact problem is.

Maybe you can try --precision half argument I have just pushed to use less memory at test time.

Thank you.

Raymongd007 commented 6 years ago

Hi Muneebaadil, may I ask if you have solved the problem? We've got the same issue while doing the training step.

muneebaadil commented 6 years ago

Unfortunately, no. Although I did test the code in another GPU and it runs fine there. So, currently my best guess is that it has something to do with CUDA. I'll try reconfiguring CUDA and will let you know about it.

muneebaadil commented 6 years ago

I reconfigured CUDA from scratch, and the problem has vanished. Weird though. Closing this issue now..

BigChen28 commented 5 years ago

I get the same out of memory error (in 11GB GPU RAM) when running final model of EDSR. And I also get the same error when I try to change min_size = 2000. Howerer, when I set the batch_size to 10, I find that this problem does not occur.

sanghyun-son / EDSR-PyTorch

EDSR out of memory at test time #16