uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
310 stars 69 forks source link

RuntimeError: CUDA out of memory. #40

Open akashbahai opened 1 year ago

akashbahai commented 1 year ago

Hi, Thanks for creating this tool and providing the code. I am able to run the prediction on the example sequences, but when I am trying to make a prediction for my usecase, I run into 'CUDA out of memory errors'.

Here's the entire error message:

Traceback (most recent call last): File "/home/akash.bahai/RoseTTAFold2NA/network/predict.py", line 346, in pred.predict(inputs=args.inputs, out_prefix=args.prefix, ffdb=ffdb) File "/home/akash.bahai/RoseTTAFold2NA/network/predict.py", line 226, in predict self._run_model(Ls, msa_orig, ins_orig, t1d, t2d, xyz_t, xyz_t[:,0], alphat, "%s%02d"%(out_prefix, i_trial)) File "/home/akash.bahai/RoseTTAFold2NA/network/predict.py", line 270, in _run_model logit_s, logit_aa_s, logit_pae, init_crds, alphaprev, , pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model( File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/akash.bahai/RoseTTAFold2NA/network/RoseTTAFoldModel.py", line 93, in forward pair, state = self.templ_emb(t1d, t2d, alpha_t, xyz_t, pair, state, use_checkpoint=use_checkpoint) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/akash.bahai/RoseTTAFold2NA/network/Embeddings.py", line 190, in forward templ = self.templ_stack(templ, xyz_t, use_checkpoint=use_checkpoint) # (B, T, L,L, d_templ) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/akash.bahai/RoseTTAFold2NA/network/Embeddings.py", line 132, in forward templ = self.block[i_block](templ, rbf_feat) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/home/akash.bahai/RoseTTAFold2NA/network/Track_module.py", line 95, in forward pair = pair + self.drop_row(self.row_attn(pair, rbf_feat)) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/akash.bahai/RoseTTAFold2NA/network/Attention_module.py", line 453, in forward pair = self.norm_pair(pair) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward return F.layer_norm( File "/home/akash.bahai/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/functional.py", line 2503, in layer_norm return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled) RuntimeError: CUDA out of memory. Tried to allocate 4.01 GiB (GPU 0; 31.75 GiB total capacity; 22.47 GiB already allocated; 3.67 GiB free; 27.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This error is happening in the last step of the pipeline i.e. end-to-end prediction. I am using a V100 with 32 GB of memory. The length of the RNA is ~1300 NT's and and the protein is ~700 amino acids. Can you please estimate the size of the GPU memory required for such an usecase?

GanQiao1990 commented 8 months ago

Did you solve the question? Even when we buy a 40 GB NVIDIA GPU, it still has the problem.

ChrisLou-bioinfo commented 7 months ago

same issue

akashbahai commented 7 months ago

Hi, @ChrisLou-bioinfo How long is the sequence that you are trying to predict? Is it working for shorter sequences?

@GanQiao1990 Are you able to run it now? I could get it working for shorter sequences, but it didn't work for longer ones (>1000 residues).

ChrisLou-bioinfo commented 7 months ago

Example sequence could work.

RNA(1485nt),Protein(348AA)

I attempted to set the max_split_size_mb parameter to 24, but it appears to be ineffective.

I am trying to use cpu.

GanQiao1990 commented 7 months ago

Hi, @ChrisLou-bioinfo How long is the sequence that you are trying to predict? Is it working for shorter sequences?

@GanQiao1990 Are you able to run it now? I could get it working for shorter sequences, but it didn't work for longer ones (>1000 residues).

Yep, i could run by using the shorter sequence, but it's could be out of memery by using the long sequence.

ChrisLou-bioinfo commented 7 months ago

Hi, @ChrisLou-bioinfo How long is the sequence that you are trying to predict? Is it working for shorter sequences? @GanQiao1990 Are you able to run it now? I could get it working for shorter sequences, but it didn't work for longer ones (>1000 residues).

Yep, i could run by using the shorter sequence, but it's could be out of memery by using the long sequence.

我正在用cpu运行没有报错,可能时间会长一点,但愿作者能升级一下出现OOM以后自动用CPU。

akashbahai commented 7 months ago

@GanQiao1990 In that case, you'll probably need a GPU with larger memory.

@ChrisLou-bioinfo I don't think there's a way for the method to know beforehand, which mode should it use. It'll use the default mode unless you specify CPU/GPU specifically. It's possible to select the mode beforehand by looking at the length of the prediction sequence, by putting an If condition maybe.

AngelaAmari commented 7 months ago

OK, I even tried on an A100 with 80GB VRAM... it is not working for the ~5k bp RNA and ~500 AA Protein. If we are to chunk the sequences, are we going to have to redo the sequence prep?