Closed ghost closed 6 years ago
RuntimeError: CUDA error: out of memory
I had the same issue! I solved it by decreasing batch size and increasing snapshot interval in my .yml file. When you set up a very big batch size and push your system to do snapshots often it requires too much GPU memory and you can get this type of error. If the following advice do not help you to fix this problem, I can suggest you only to try decreasing the brunch num in .yml file. But it will teach neural network produce lower resolution pictures. Although, it still will be higher resolution pictures than with original DGCAN.
@vadimfedulov321 - Thanks for help. How many GPUs did you use ? I am trying to run it on my laptop - Geforce GTX1050ti.
Also, I need help as I am very new to this. current batch size in 24 and snapshot interval is 2000. Can you share the parameters which worked for you ??
@sunshinevirgo21 I have only one GPU, NVIDIA GTX 980. I used batch size 4 and snapshot interval 1000, but I think I can try to set up a higher batch size because it is going to make learning faster. The bigger batch size you have - the faster learning goes and more CUDA memory is required.
@vadimfedulov321 - Thank you so much. I was worried whether I'll be able to run this on my machine as I got this error while I was using the pre-trained model. Were you able to replicate the results ?
@sunshinevirgo21 I used my own dataset and everything went okay. I changed the code a little bit to add my own dataset. I still pretty enjoyed with the results of training, very impressive!
@vadimfedulov321 - Thanks :+1:
Traceback (most recent call last): File "main.py", line 146, in
algo.evaluate(split_dir)
File "/home/user/Downloads/StackGAN-v2-master/code/trainer.py", line 874, in evaluate
fakeimgs, , _ = netG(noise, t_embeddings[:, i, :])
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, kwargs)
File "/home/user/Downloads/StackGAN-v2-master/code/model.py", line 275, in forward
h_code3 = self.h_net3(h_code2, c_code)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/user/Downloads/StackGAN-v2-master/code/model.py", line 215, in forward
out_code = self.upsample(out_code)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, *kwargs)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(input, kwargs)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/modules/batchnorm.py", line 66, in forward
exponential_average_factor, self.eps)
File "/home/user/anaconda2/envs/stackGANv2/lib/python2.7/site-packages/torch/nn/functional.py", line 1254, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: CUDA error: out of memory