cuda error - Githubissues

jdjin3000 / PRG-MoE

6 stars 2 forks source link

cuda error #4

Closed MANLP-suda closed 1 year ago

MANLP-suda commented 1 year ago

CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 39.45 GiB total capacity; 37.87 GiB already allocated; 10.25 MiB free; 38.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I find it always easy to exceed the VRAM. How do you solve this problem

jdjin3000 commented 1 year ago

As you said, OOM problem occurs frequently in single GPU environment. In this situation, you can use the gpus argument. It allows you to use multiple GPUs in parallel. The argument gpus is a list type, and you can put multiple GPU numbers in the list. If you are in a multi-GPU environment, this option will solve the problem by using VRAM space simultaneously.