xinntao / ESRGAN

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
https://github.com/xinntao/BasicSR
Apache License 2.0
6.04k stars 1.07k forks source link

Out of VRAM Memory CUDA Error #86

Open ctamv opened 4 years ago

ctamv commented 4 years ago

Good evening everyone :)

This is my first time posting here and I’m a complete Python/Deep Learning/AI noobie for the time being, so apologies in advance if I end up wasting your time after all with this whole Thread.

Anyways, some days ago I was able to successfully install the ESRGAN Upsaler (https://github.com/xinntao/ESRGAN) along with all of its prerequisites.

I tested my first image with my new RTX 2070 Super and the result came out in an astounding 4 seconds (considering that last time I tried it with a CPU it took 7 hours).

Unfortunately though, when I tried to replicate this whole process, the following error popped up in the Anaconda Prompt and which I would appreciate it extremely if anyone could help me resolve it:

Traceback (most recent call last): File “test.py”, line 34, in output = model(imgLR).data.squeeze().float().cpu().clamp(0, 1).numpy() File “C:\Users\ctamv\Anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 541, in call result = self.forward(*input, **kwargs) File “D:\2414FC7A.Viber_p61zvh252yqyr!Viber.Universal\ESRGAN-master\RRDBNet_arch.py”, line 75, in forward fea = self.lrelu(self.upconv2(F.interpolate(fea, scale_factor=2, mode=‘nearest’))) File “C:\Users\ctamv\Anaconda3\lib\site-packages\torch\nn\functional.py”, line 2500, in interpolate return torch._C._nn.upsample_nearest2d(input, _output_size(2)) RuntimeError: CUDA out of memory. Tried to allocate 5.21 GiB (GPU 0; 8.00 GiB total capacity; 3.01 GiB already allocated; 2.66 GiB free; 336.43 MiB cached)

I have been trying for hours until now to solve this problem after visiting multiple other threads, but with no success (mostly because I don’t even know where to input PyTorch commands in the fist place, as the Anaconda Prompt doesn’t let me run them…)

Finally, one last observation I did was the fact that, according to the last line of the above output, even though my GPU's full 8GBs of VRAM are correctly displayed, along with the current 3GBs approximately that are being used and the cache, the remaining 2.66GB of free VRAM are incorrectly displayed, as with basic calculations they should had been AT LEAST 4.6GBs of free VRAM if the above numbers are corect...

Apologies again for my general ignorance on the subjectand thanks in advance for your help :)

Josee9988 commented 4 years ago

Same issue here

ahbon123 commented 4 years ago

Same issues, please help me. Thanks.

Traceback (most recent call last):
  File "test.py", line 34, in <module>
    output = model(img_LR).data.squeeze().float().cpu().clamp_(0, 1).numpy()
  File "/home/x/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/x/文档/GitHub/image-resolution/ESRGAN/RRDBNet_arch.py", line 75, in forward
    fea = self.lrelu(self.upconv2(F.interpolate(fea, scale_factor=2, mode='nearest')))
  File "/home/x/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/x/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 419, in forward
    return self._conv_forward(input, self.weight)
  File "/home/x/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 415, in _conv_forward
    return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 2.79 GiB (GPU 0; 7.80 GiB total capacity; 3.74 GiB already allocated; 1.75 GiB free; 4.79 GiB reserved in total by PyTorch)
SvnGms commented 4 years ago

Hey, this is a common issue with your hardware. As it states, your graphics card has not enough free memory. You can try:

  1. buy a graphic card with more memory
  2. may u try a smaller image (lower resolution)

A more advanced solution would be to try your model of choice with a lower precision, i.e. you might be using float32 maybe change to Uint16 as it may lower the amount of memory required. (I am not sure if ESRGAN does support this at all, you might have to modify the model to support this and retrain)