hpcaitech / FastFold

Optimizing AlphaFold Training and Inference on GPU Clusters
Apache License 2.0
557 stars 86 forks source link

CUDA out of memory #115

Open zzy221127 opened 1 year ago

zzy221127 commented 1 year ago

Dear author:

I run Fastfold in a 4 GPU device, each GPU have an 24GiB memory。

I run inference.py with an fasta lenght 1805AA (without triton), with parameter --gpus 3

and the error prints like:

RuntimeError: CUDA out of memory. Tried to allocate 29.26 GiB (GPU 0; 23.70 GiB total capacity; 9.63 GiB already allocated; 11.79 GiB free; 10.65 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my questions is:

1) why there are only one GPU(GPU0 but not GPU0, GPU1,GPU2) used to calculate total memory? what should I do to get over this?

2) Is there a way to run an extremely long fasta files, like 4000AA?

appriciate your reply, thankyou.

Shenggan commented 1 year ago

I think if you can check args.gpus in the code. It suppose to be 3 if you add parameter correctly.

Alphafold's embedding presentations take up a lot of memory as the sequence length increases. To reduce memory usage, you should add parameter --chunk_size [N] and --inplace to cmdline or shell script ./inference.sh. The smaller you set N, the less memory will be used, but it will affect the speed.