RuntimeError: CUDA out of memory.

uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction

MIT License

322 stars 72 forks source link

RuntimeError: CUDA out of memory. #13

Open ylx6266 opened 2 years ago

ylx6266 commented 2 years ago

RuntimeError: CUDA out of memory. Tried to allocate 1.54 GiB (GPU 0; 10.75 GiB total capacity; 6.28 GiB already allocated; 1.44 GiB free; 7.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

fdimaio commented 2 years ago

How large is the structure you are trying to predict?

ylx6266 commented 2 years ago

How large is the structure you are trying to predict?

The complex I want to predict consists of a protein with 872 residues and a DNA with 12bp.

lilc-112 commented 2 years ago

I had the same problem, and my GPU mem is 8GB. Does it have to run on a multi-GPU server?

FSong2020 commented 2 years ago

I have encountered this same issue. It was running on a 3090 GPU with 24GB mem.

georgkempf commented 2 years ago

Running on a GPU with more memory solved the problem for me when I encountered this error.

ylx6266 commented 2 years ago

Running on a GPU with more memory solved the problem for me when I encountered this error.

I forced this program to use CPU, which can avoid this error, but it was too slow.

fdimaio commented 1 year ago

As others have pointed out, for large complexes the memory requirements might be high, using a higher memory card (or CPU only) may be necessary.

I have a memory-optimized version I will try to push in the next couple of weeks (need to make sure results are the same).

mf-rug commented 1 year ago

More of a FYI: I'm getting CUDA out of memory for a complex of a 505 aa dimer (1010 aa in total) with a 76 bases RNA molecule (x2 = 152 in total) on an NVIDIA V100 with 32GB memory, which is somewhat disappointing as I was hoping to also model considerable larger complexes than that in the future.

kcygan commented 1 year ago

If anyone is encountering this error, I just solved it yesterday by: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:256' and then running the script as usual. I have a 16GB GPU and trying to solve for 1030nts long RNA structure.

Thanks,

Kamil

dwbaron commented 1 year ago

met the same problem, hope a new version of multi-gpu in the future

bifxcore commented 1 year ago

FYI I tested a similar system to the one described by @mf-rug (2x500aa + 76b RNA) on a 16GB GPU, using @kcygan recommendation of setting the PYTORCH_CUDA_ALLOC_CONF environment. I still got the CUDA out of memory error.

I also tried suggestions from stackoverflow

adding garbage_collection_threshold:0.6 to the PYTORCH_CUDA_ALLOC_CONF
setting torch.cuda.empty_cache() at the start of the _run_model() function

Alas none of these helped.

However running predict.py on 8 CPUs with 64GB memory on an HPC completed in ~ 5hrs. Good enough for me in this instance.

fglaser commented 10 months ago

Hi same problem here, I have the following card NVIDIA GeForce RTX 2080 Ti , I know 11Mb is not a lot, but maybe there is a workaround?

I have many CPUs', but I am unsure how to make the modeling itself with CPUs, can anybody give accurate instructions if this is possible?

Thanks!! Fabian

Trying to solve a big protein/RNA complex...

sherryliu987 commented 7 months ago

If you're struggling to run RosettaFold2NA locally, feel free to try https://www.tamarind.bio/rosettafold2na. Tamarind is an online platform for bioinformatics tools that offers structural biology workflows, including RoseTTAFold2NA, for free.