CUDA out of memory - Githubissues

schmacko234 commented 4 years ago

Hi,

as the title says, my gpu ran out of memory. is there some way to reduce the batch size for cuda? i cant determine which variables in DNP are to change to reduce batch size.

capi1O commented 4 years ago

same error here RuntimeError: CUDA error: an illegal memory access was encountered

mosheman5 commented 3 years ago

Perhaps try to run with a shorter file. What's the GPU memory capacity of you're machine? Running on demo file requires 1520 MiB This discussion might be helpful: https://discuss.pytorch.org/t/weird-cuda-illegal-memory-access-error/8848/18 Let me know if any of it helped

capi1O commented 3 years ago

I run it on RTX 2070 with 8GB of DDR6, I tried to reduce the number of iterations (python DNP.py --run_name demo --noisy_file demo.wav --samples_dir samples --save_every 50 --num_iter 500 --LR 0.001) but same errors, and I also tried on the demo file.

In my case I had to add Option "Interactive" "0" to my xorg conf to avoid error Cuda runtime error : the launch timed out and was terminated (because GPU is also used to drive the display and kernel kills the CUDA process if it is too long to respond) so that might be related to that.

I will try the solutions proposed in the link you mentioned

th3geek commented 3 years ago

I'm having the same problem. I'm not getting the "illegal memory access was encountered" as another user above reported, but its saying my GPU is out of memory when it shouldn't be. Nvidia-smi is showing only 20mb of 8gb in use, but when I run DNP it reports that 6-7.5GB are used and I'm only able to get DNP to work using very small snippets of audio. It must be a driver or code issue. For example

$ nvidia-smi
Wed Nov  4 15:06:54 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 107...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   49C    P8     3W /  N/A |     11MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2491      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A      2983      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

(DNP) ~/DNP$ python DNP.py --run_name demo --noisy_file s3f4-test.wav --samples_dir samples --save_every 50 --num_iter 5000 --LR 0.001
unet
  0%|                                                                                                                                             | 0/5000 [00:00<?, ?it/s]/home/user/anaconda3/envs/DNP/lib/python3.7/site-packages/torch/nn/functional.py:2351: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")

Traceback (most recent call last):
  File "DNP.py", line 91, in <module>
    , save_every=opts.save_every)
  File "DNP.py", line 68, in dnp
    optimize(model, criterion, input, target, samples_dir, LR, num_iter, sr, save_every, accumulator)
  File "DNP.py", line 18, in optimize
    out = model(input)
  File "/home/user/anaconda3/envs/DNP/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/apoorshlub/DNP/unet.py", line 51, in forward
    x = torch.cat([x,encoder[self.num_layers - i - 1]],dim=1)
RuntimeError: CUDA out of memory. Tried to allocate 764.75 MiB (GPU 0; 7.93 GiB total capacity; 6.69 GiB already allocated; 619.12 MiB free; 134.34 MiB cached)

mosheman5 / DNP

CUDA out of memory #9