limbee / NTIRE2017

Torch implementation of "Enhanced Deep Residual Networks for Single Image Super-Resolution"
652 stars 146 forks source link

GPU RAM required for 'test' #4

Closed eturner303 closed 6 years ago

eturner303 commented 7 years ago

Greetings. Thanks for writing this paper and making your code repo available.

I've been attempting to reproduce the DIV2K results. I was able to generate t7 files without issue, and am able to run the 'train' code for the X2 bilinear model (training from scratch).

However, after the first 1000 iterations or so, when the code attempts to do a "test" cycle, it runs out of GPU ram. How much GPU ram is required to run the "test" section? Is there any way to reduce the RAM requirements for "test"? I tried dropping the mini-batch size but this only seems to affect RAM usage in "train".

For now I've disabled the "test" section and am creating a dummy PSNR array to enable training to run, but this makes it impossible to track PSNR during training (all I can see is loss going down).

limbee commented 7 years ago

Theoretically, our code can test images of arbitrary size in a GPU with arbitrarily sized memory. This is possible by changing the -chopSize option. The default value (16e4) is set to Titan X, but if your memory size is less than 12GB, you can reduce it until there are no errors. This will only give negligible impact on PSNR.

scalerhyme commented 7 years ago

Hello, I think I'm getting the similar error when I attempt to reproduce the DIV2K results using less Feature channels on bicubic_x2(we are trying to reduce the running time), the error I get is: THCudaCheck FAIL file=/xxx/torch/extra/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory cuda runtime error (2) : out of memory at /xxxx/torch/extra/cutorch/lib/THC/generic/THCStorage.cu:66 I tried to solve this problem by changing the -chopSize option, but it doesn't work. can you give some suggestions? Also, Do you have any idea to reduce the running time of the algorithm without affecting the results as far as possible? Thank you very much

limbee commented 7 years ago

Hi. How far have you reduced the value of -chopSize option? If the value of the option is reduced sufficiently, for example, less than 1e4, the 'out-of-memory' error should not occur. If the error persists, check the amount of GPU memory available.

The best way to reduce running time is to use multiple GPUs with the -nGPU option. CUDNN also helps to speed the inference. Setting the -chopSize as large as possible within the error-free range may speed up the inference slightly.

scalerhyme commented 7 years ago

Thanks for your answer! I just realize that I have encountered two problems in memory since I have 8GB memory size available: the first one is to reduce the -chopSize as you mentioned (and it can be solved by setting the chopSize to 4e4), the second problem is that I will also get oom in training, and this can be solve by setting -splitBatch option.

I think I didn't make the 'reduce running time' question clear: we want to improve the speed of the SR operation, not only for testing. Besides of using multiple GPUs, we also want to try some smaller model/network(so that the computation operations will be reduced to improve the running time).

Thanks again!

limbee commented 7 years ago

I'm glad you solved the problem. If you want to reduce the amount of computation, you have to sacrifice some performance for the current algorithm. If extreme performance is not required, perhaps the best option is to reduce the number of feature maps and increase the model depth. If you want to speed up the training procedure without touching the parameters of the model, you can reduce the batch size or reduce the patch size.

On the other hand, having the same performance with a smaller model might be a separate study topic, perhaps model compression. If you find out more effective network structures or training techniques, please share them someday :) Thank you.