morawi / TextGAN

Unsupervised Text Segmentation using CycleGAN
Other
1 stars 0 forks source link

RuntimeError: CUDA out of memory. #1

Open prashantg445 opened 4 years ago

prashantg445 commented 4 years ago

I had 7 GB GPU VRAM available and around 13 GB RAM available with training configurations:

I have placed my training data outside the repo with relative path (as per cyclegan_text.py):

../data//train/A/ ../data//train/B/

Similarly test files. Also, commented line 56 in cyclegan_text.py:

opt.dataset_name = 'text_segmentation' + str(opt.img_width)

#############

But when I run python3 cyclegan_text.py, I get following error trace:

Experiment parameters Namespace(AMS_grad=True, aligned=False, b1=0.5, b2=0.999, batch_size=16, batch_test_size=1, channels=3, checkpoint_interval=5, data_mode='', dataset_name='mandate', decay_epoch=10, epoch=0, experiment_name='mandate-Apr-5', img_height=80, img_width=500, lambda_GAN_AB=tensor(1., device='cuda:0'), lambda_GAN_BA=tensor(1., device='cuda:0'), lambda_cycle_A=tensor(10., device='cuda:0'), lambda_cycle_B=tensor(10., device='cuda:0'), lambda_id_A=tensor(5., device='cuda:0'), lambda_id_B=tensor(5., device='cuda:0'), lr=0.0002, n_cpu=8, n_epochs=20, n_residual_blocks=9, p_RGB2BGR_augment=0, p_invert_augment=0, sample_interval=100, seed_value=12345, show_progress_every_n_iterations=20, test_interval=10, use_F1_loss=False, use_whollyG=False)

../data/mandate/train/B/. ../data/mandate/test/B/. Traceback (most recent call last): File "cyclegan_text.py", line 267, in loss_id_B = criterion_identity_B(G_AB(real_B), real_B)
File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/tft-ml/TextGAN/models.py", line 76, in forward return self.model(x) File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward input = module(input) File "/home/tft-ml/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/tft-ml/TextGAN/models.py", line 33, in forward return x + self.conv_block(x) RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 7.80 GiB total capacity; 5.90 GiB already allocated; 27.06 MiB free; 6.64 GiB reserved in total by PyTorch)

I am watching nvidia-smi but it seems that GPU is not being fully utilized, neither by this process nor by some other processes.

Tried, but didn't worked:

  1. torch.cuda.empty_cache()
  2. Downgrading torch version to 0.4.0 and 1.0.0 and 1.1.0
morawi commented 4 years ago

Sorry for not replying earlier. @prashantg445 might be difficult to tell from nvidia-smi whether the GPU is fully utilized or not. Try to scale down the images to 128x128 and see if it is gonna work or not. What is your image resolution by the way? More, torch==1.4.0 should work immediately.