uw-ipd / RoseTTAFold2NA

RoseTTAFold2 protein/nucleic acid complex prediction
MIT License
306 stars 67 forks source link

CUDA error: CUBLAS_STATUS_INVALID_VALUE #76

Closed denizkavi closed 6 months ago

denizkavi commented 6 months ago

Hi,

While running the example ../run_RF2NA.sh rna_pred rna_binding_protein.fa R:RNA.fa command from the example folder. I get the following error:

Running on GPU
           plddt    best
Traceback (most recent call last):
  File "/home/azureuser/RoseTTAFold2NA/network/predict.py", line 374, in <module>
    pred.predict(inputs=args.inputs, out_prefix=args.prefix, ffdb=ffdb)
  File "/home/azureuser/RoseTTAFold2NA/network/predict.py", line 250, in predict
    self._run_model(Ls, msa_orig, ins_orig, t1d, t2d, xyz_t, xyz_t[:,0], alpha_t, same_chain, mask_t_2d, "%s_%02d"%(out_prefix, i_trial))
  File "/home/azureuser/RoseTTAFold2NA/network/predict.py", line 296, in _run_model
    logit_s, logit_aa_s, logit_pae, p_bind, init_crds, alpha_prev, _, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model(
  File "/home/azureuser/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/RoseTTAFold2NA/network/RoseTTAFoldModel.py", line 72, in forward
    msa_latent, pair, state = self.latent_emb(msa_latent, seq, idx, same_chain)
  File "/home/azureuser/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/RoseTTAFold2NA/network/Embeddings.py", line 67, in forward
    msa = self.emb(msa) # (B, N, L, d_model) # MSA embedding
  File "/home/azureuser/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/azureuser/.conda/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

The MSA step seems to have gone correctly, did anyone else have this error while running the example? Thanks.

denizkavi commented 6 months ago

Was able to resolve the issue by applying the following change to the conda environment: https://github.com/uw-ipd/RoseTTAFold2NA/issues/36#issuecomment-1851788518