Out of memory - Githubissues

Wisgon commented 6 years ago

Hi, thanks for the job,first. I got an out of memory error, I have a GTX1070ti GPU and it has a 8GB memory, So, how many memory does this app need?

Wisgon commented 6 years ago

I setup my environment as your guides in readme, when I run: $ ./eval.py -i configs/srresnet.json resources/pretrained/srresnet.pth path/to/image.jpg
AND $./eval.py -i configs/srgan.json resources/pretrained/srgan.pth path/to/image.jpg they both run out of memory, here is the error output:

Running on GPU 0 Restored checkpoint from resources/pretrained/srgan.pth THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "eval.py", line 157, in main(sys.argv[1:]) File "eval.py", line 120, in main data = runner.infer(loader) File "/home/jhd/face_recognition/softwares/srgan/training/baserunner.py", line 128, in infer , data = self._val_step(loader, compute_metrics=False) File "/home/jhd/face_recognition/softwares/srgan/training/adversarial_runner.py", line 294, in _val_step prediction = self.gen(inp) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, kwargs) File "/home/jhd/face_recognition/softwares/srgan/models/srresnet.py", line 195, in forward x = self.upsample(x + initial) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(*input, *kwargs) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/module.py", line 325, in call result = self.forward(input, kwargs) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 277, in forward self.padding, self.dilation, self.groups) File "/home/jhd/face_recognition/anaconda3/envs/srgan/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d return f(input, weight, bias) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1512386481460/work/torch/lib/THC/generic/THCStorage.cu:58

mseitzer commented 6 years ago

I guess you used an image which is too large.

This network is pretty memory intensive. For upscaling a 512x512 image, I saw a peak memory usage of 12GB.

I don't know what exactly Pytorch needs to hold in memory. I guess a minimum bound of memory the network needs is 2 16 width height 256 * 4 bytes, which is based on the output size of the largest feature map the network computes (and which it needs to hold twice, as input and output). In addition, the network parameters must be hold in memory, which is an additional couple hundred megabytes.

If you have enough RAM, you can try to upscale on your CPU using the -c '' switch. This will, of course, take longer than on GPU. Another option would be to crop the image into parts and upscale each part individually.

Wisgon commented 6 years ago

Thanks for reply, it works when I use a smaller picture! Thank you very much.

mseitzer / srgan

Out of memory #2