CUDA out of memory error

zhengminlai commented 5 years ago

Hi, Dr. Su, sorry to bother you.

I just ran the training process in PyCharm(I use pytorch 1.1 installed by pip19.1, python 3.7, and CUDA 9.0 on Windows) following your instruction in README.md, and I got the CUDA OOM error. I tried to adjust the parameters like batchSize and num_models but it didn't help.

Would you mind telling me how to fix this CUDA OOM issue? What's your GPU capacity used in the exp? Thank you.

Here is the traceback of CUDA OOM error:


  File "train_mvcnn.py", line 66, in <module>
    trainer.train(30)
  File "C:\Work\CompVision\Code\proj\MVCNN\tools\Trainer.py", line 58, in train
    out_data = self.model(in_data)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Work\CompVision\Code\proj\MVCNN\models\MVCNN.py", line 65, in forward
    y = self.net_1(x)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\container.py", line 92, in forward
    input = module(input)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\pooling.py", line 146, in forward
    self.return_indices)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\_jit_internal.py", line 133, in fn
    return if_false(*args, **kwargs)
  File "C:\Users\Karn\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\functional.py", line 494, in _max_pool2d
    input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 4.00 GiB total capacity; 2.96 GiB already allocated; 68.48 MiB free; 7.42 MiB cached)

jongchyisu commented 5 years ago

I'm using titanx to train which has 12 GB. 4GB is definitely not enough.

TonyYuan114514 commented 2 years ago

Hi, I have the totally same problem just like yours', the traceback of the error is the same too. Have you solved your problem? Is it the problem of the lack of GPU memory?

jongchyisu / mvcnn_pytorch

CUDA out of memory error #9