hanzhanggit / StackGAN-v2

MIT License
843 stars 190 forks source link

CUDA out of memory ? How to resolve this issue ? #15

Closed ghost closed 5 years ago

ghost commented 6 years ago

Traceback (most recent call last): File "main.py", line 146, in algo.evaluate(split_dir) File "/home/user/Downloads/StackGAN-v2-master/code/trainer.py", line 874, in evaluate fakeimgs, , _ = netG(noise, t_embeddings[:, i, :]) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/user/Downloads/StackGAN-v2-master/code/model.py", line 275, in forward h_code3 = self.h_net3(h_code2, c_code) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/user/Downloads/StackGAN-v2-master/code/model.py", line 215, in forward out_code = self.upsample(out_code) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/batchnorm.py", line 66, in forward exponential_average_factor, self.eps) File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/functional.py", line 1254, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA error: out of memory

vadimfedulov256 commented 6 years ago

RuntimeError: CUDA error: out of memory

I had the same issue! I solved it by decreasing batch size and increasing snapshot interval in my .yml file. When you set up a very big batch size and push your system to do snapshots often it requires too much GPU memory and you can get this type of error. If the following advice do not help you to fix this problem, I can suggest you only to try decreasing the brunch num in .yml file. But it will teach neural network produce lower resolution pictures. Although, it still will be higher resolution pictures than with original DGCAN.

ghost commented 5 years ago

@vadimfedulov321 - Thanks for help. How many GPUs did you use ? I am trying to run it on my laptop - Geforce GTX1050ti.

ghost commented 5 years ago

Also, I need help as I am very new to this. current batch size in 24 and snapshot interval is 2000. Can you share the parameters which worked for you ??

vadimfedulov256 commented 5 years ago

@sunshinevirgo21 I have only one GPU, NVIDIA GTX 980. I used batch size 4 and snapshot interval 1000, but I think I can try to set up a higher batch size because it is going to make learning faster. The bigger batch size you have - the faster learning goes and more CUDA memory is required.

ghost commented 5 years ago

@vadimfedulov321 - Thank you so much. I was worried whether I'll be able to run this on my machine as I got this error while I was using the pre-trained model. Were you able to replicate the results ?

vadimfedulov256 commented 5 years ago

@sunshinevirgo21 I used my own dataset and everything went okay. I changed the code a little bit to add my own dataset. I still pretty enjoyed with the results of training, very impressive!

ghost commented 5 years ago

@vadimfedulov321 - Thanks :+1: