Closed xiaxichen closed 9 months ago
Hi @xiaxichen ,
From the output you shared, it looks like the error comes from the CUDA version incompatibility. I saw reports of similar issues in AlphaFold repository. Could you try changing CUDA version suggested in https://github.com/google-deepmind/alphafold/issues/55#issuecomment-889914885?
Best, Rui
Hi @xiaxichen ,
From the output you shared, it looks like the error comes from the CUDA version incompatibility. I saw reports of similar issues in AlphaFold repository. Could you try changing CUDA version suggested in google-deepmind/alphafold#55 (comment)?
Best, Rui
Does this tcrmodel have a docker version of the image?
Currently, we don't offer a Docker version of the image. However, we do have a Singularity container available, should you be willing to explore that option. I would like to note that adjusting the CUDA version to resolve the issue should not require either Docker or Singularity. You may try to change the CUDA version directly on your system, which could potentially fix the problem without the need for these containers.
I now have an error like this. How should I solve this problem? The graphics card is A10. ` Thu Feb 8 21:46:24 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | | 0% 19C P8 15W / 300W | 0MiB / 23028MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
0208 21:42:39.288548 140233354532672 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences. I0208 21:42:39.288659 140233354532672 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 20. I0208 21:42:39.300979 140233354532672 pipeline_multimer_custom_templates.py:221] Running monomer pipeline on chain C: pMHC I0208 21:42:39.301134 140233354532672 pipeline_custom_templates.py:251] input_sequence is RLPAKAPLLSHSLKYFHTSVSRPGRGEPRFISVGYVDDTQFVRFDNDAASPRMVPRAPWMEQEGSEYWDRETRSARDTAQIFRVNLRTLRGYYNQSEAGSHTLQWMHGCELGPDGRFLRGYEQFAYDGKDYLTLNEDLRSWTAVDTAA QISEQKSNDASEAEHQRAYLEDTCVEWLHKYLEKGKETL I0208 21:42:39.352231 140233354532672 pipeline_custom_templates.py:430] Final (deduplicated) MSA size: 1 sequences. I0208 21:42:39.352303 140233354532672 pipeline_custom_templates.py:432] Total number of templates (NB: this can include bad templates and is later filtered to top 4, and mock/no templates would also show number of templates as 1): 4. I0208 21:42:39.373223 140233354532672 run_alphafold_tcrmodel2.3.py:340] Running model model_1_multimer_v3_pred_0 on test_clsI_6kzw_pmhc_oc I0208 21:42:39.373742 140233354532672 model.py:180] Running predict with shape(feat) = {'aatype': (414,), 'residue_index': (414,), 'seq_length': (), 'msa': (4096, 414), 'num_alignments': (), 'template_aatype': (4, 414), 'template_all_atom _mask': (4, 414, 37), 'template_all_atom_positions': (4, 414, 37, 3), 'asym_id': (414,), 'sym_id': (414,), 'entity_id': (414,), 'deletion_matrix': (4096, 414), 'deletion_mean': (414,), 'all_atom_mask': (414, 37), 'all_atom_positions': (41 4, 37, 3), 'assembly_num_chains': (), 'entity_mask': (414,), 'num_templates': (), 'cluster_bias_mask': (4096,), 'bert_mask': (4096, 414), 'seq_mask': (414,), 'msa_mask': (4096, 414)} 2024-02-08 21:42:39.390710: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:114] WARNING You are using ptxas 11.0.221, which is older than 11.1. ptxas before 11.1 is known to miscompile XLA co de, leading to incorrect results or invalid-address errors.
2024-02-08 21:42:39.391881: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.6 2024-02-08 21:42:39.391900: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas 2024-02-08 21:42:39.393683: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found 2024-02-08 21:42:39.393721: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function Traceback (most recent call last): File "run_alphafold_tcrmodel2.3.py", line 712, in
app.run(main)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_alphafold_tcrmodel2.3.py", line 671, in main
predict_structure(
File "run_alphafold_tcrmodel2.3.py", line 352, in predict_structure
prediction_result = model_runner.predict(processed_feature_dict,
File "/opt/tcrmodel2/alphafold/model/model.py", line 182, in predict
result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey
key = prng.seed_with_impl(impl, seed)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
return random_seed(seed, impl=impl)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed
return random_seed_p.bind(seeds_arr, impl=impl)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 329, in bind
return self.bind_with_trace(find_top_trace(args), args, params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
out = trace.process_primitive(self, map(trace.full_raise, args), params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
return primitive.impl(*tracers, params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
base_arr = random_seed_impl_base(seeds, impl=impl)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
return seed(seeds)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed
lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
return shift_right_logical_p.bind(x, y)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 329, in bind
return self.bind_with_trace(find_top_trace(args), args, params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
out = trace.process_primitive(self, map(trace.full_raise, args), params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
return primitive.impl(*tracers, params)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
return compiled_fun(args)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in
return lambda args, kw: compiled(*args, kw)[0]
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
out_flat = compiled.execute(in_flat)
jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function
Traceback (most recent call last):
File "run_tcrmodel2.py", line 336, in
app.run(main)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/ubuntu/anaconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "run_tcrmodel2.py", line 238, in main
with open("%s/%s_pmhc_oc/model_scores.txt" % (out_dir, job_id)) as fh:
FileNotFoundError: [Errno 2] No such file or directory: 'experiments/test_clsI_6kzw/test_clsI_6kzw_pmhc_oc/model_scores.txt'
`