hanzhanggit / StackGAN-v2

MIT License
843 stars 190 forks source link

Cuda out of memory error #2

Open Lotayou opened 6 years ago

Lotayou commented 6 years ago

I try to reproduce the code but get stuck in cuda out of memory error when loading Inception-v3 model.

I tried both on a Windows 10 PC with Nvidia 1060X graphic card (6G) and a Linux server with Nvidia Geforce Titan Graphic card (12G). But both time I ran out of memory with the following message:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory Traceback (most recent call last): File "main.py", line 144, in <module> algo.train() File "/backup1/lingboyang/StackGANv2/code/trainer.py", line 666, in train self.inception_model, start_count = load_network(self.gpus) File "/backup1/lingboyang/StackGANv2/code/trainer.py", line 126, in load_network netsD[i] = torch.nn.DataParallel(netsD[i], device_ids=gpus) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 59, in __init__ self.module.cuda(device_ids[0]) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply module._apply(fn) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply module._apply(fn) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply param.data = fn(param.data) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in <lambda> return self._apply(lambda t: t.cuda(device)) File "/home/vcl/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/_utils.py", line 69, in _cuda return new_type(self.size()).copy_(self, async) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THC/generic/THCStorage.cu:58

Is this normal? @hanzhanggit Could you tell me what's the minimal hardware requirement to run this program? Is there any way to save graphic memory? Thanks!

shirishr commented 6 years ago

@Lotayou, I tried this... In trainer.py I replaced any declaration of Variable like Variable(x) with Variable(x, volatile=True) See if it works for you.

ghost commented 6 years ago

@hanzhanggit @Lotayou - did you resolve this issue ??

@shirishr any fix for this ?? Traceback (most recent call last): File "main.py", line 146, in algo.evaluate(split_dir) File "/home/user/Downloads/StackGAN-v2-master/code/trainer.py", line 874, in evaluate fakeimgs, , _ = netG(noise, t_embeddings[:, i, :]) training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA error: out of memory

jiteshpabla commented 5 years ago

Reduce the BATCH_SIZE in cfg/eval_birds.yml to generate images without running out of memory.

shirishr commented 5 years ago

Reduce the BATCH_SIZE in cfg/eval_birds.yml to generate images without running out of memory.

I reduced the BATCH_SIZE to 9 and could work with a 4 GB GPU (I also turned variables in volatile). See if that works for you