prehensilecode / alphafold_singularity

Singularity recipe for AlphaFold
GNU General Public License v3.0
33 stars 12 forks source link

Cuda Driver Problems #41

Closed aakashsahha closed 3 weeks ago

aakashsahha commented 3 weeks ago

Hello David,

I am a postdoc in the Barry Honig lab in the Department of Systems Biology at Columbia University. I am trying to install AlphaFold locally on the cluster with Singularity. This github has been really very helpful in setting it up. I prepared the Singularity image file using the instructions from here. However, after the CPU-based MSAs are completed, my runs are failing owing to cuda driver issues. I am attaching both my input slurm script and output error file. I tried to use the pre-built image file from here as well: https://cloud.sylabs.io/library/prehensilecode/alphafold_singularity/alphafold output_AF_67697.log AF_trial_1.txt

Here is a snippet of the issue:

I1107 14:24:35.225651 140105819666240 model.py:165] Running predict with shape(feat) = {'aatype': (4, 299), 'residue_index': (4, 299), 'seq_length': (4,), 'template_aatype': (4, 4, 299), 'template_all_atom_masks': (4, 4, 299, 37), 'template_all_atom_positions': (4, 4, 299, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 299), 'msa_mask': (4, 508, 299), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 299, 3), 'template_pseudo_beta_mask': (4, 4, 299), 'atom14_atom_exists': (4, 299, 14), 'residx_atom14_to_atom37': (4, 299, 14), 'residx_atom37_to_atom14': (4, 299, 37), 'atom37_atom_exists': (4, 299, 37), 'extra_msa': (4, 5120, 299), 'extra_msa_mask': (4, 5120, 299), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 299), 'true_msa': (4, 508, 299), 'extra_has_deletion': (4, 5120, 299), 'extra_deletion_value': (4, 5120, 299), 'msa_feat': (4, 508, 299, 49), 'target_feat': (4, 299, 22)} 2024-11-07 14:24:35.612795: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9 2024-11-07 14:24:35.612827: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas 2024-11-07 14:24:35.612883: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:277] Couldn't read CUDA driver version. 2024-11-07 14:24:35.629166: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found 2024-11-07 14:24:35.629234: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function Traceback (most recent call last): File "/app/alphafold/run_alphafold.py", line 468, in app.run(main)

Would you please look into it? Any suggestion from you would be appreciated. Thanks in advance!

Regards, Aakash

prehensilecode commented 3 weeks ago

I'm afraid I am not really maintaining this, any more. This was done for my previous job, and no one uses AlphaFold at my current job.

aakashsahha commented 3 weeks ago

Thank you @prehensilecode David for your prompt reply. I will try to work on it and get back to you if I can make it work.