Closed erdaqorri closed 1 year ago
This seems to be an issue in using AlphaFold/HHSearch. I am afraid I am unable to assist in using the application.
@erdaqorri Did you find a solution for this? I'm having the same issue!
@GentleInfant Eventually I could not figure this out so I ended up using the non-docker implementation of it (https://github.com/kalininalab/alphafold_non_docker) and also ParaFold (https://github.com/Zuricho/ParallelFold). I benchmarked them both and ParaFold seems to be faster. If you have questions, let me know.
Hi, I am running AlphaFold2 using the prebuilt container that you provided. I made the necessary changes and keep running into the issue below. What could be causing this? (I use cuda/11.4 on my system)
SLURM_JOBID=1955090 SLURM_JOB_NODELIST=x1001c6s0b0n1 SLURMTMPDIR= Number of Nodes Allocated = 1 Number of Tasks Allocated = Number of Cores/Task Allocated = 50 Working Directory = /home/p_af2qe working directory = /project/home/p_af2qe I0811 11:40:09.772996 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/pilot_experiment -> /mnt/fasta_path_0 I0811 11:40:09.773108 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/uniref90 -> /mnt/uniref90_database_path I0811 11:40:09.773177 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/mgnify -> /mnt/mgnify_database_path I0811 11:40:09.773239 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db -> /mnt/data_dir I0811 11:40:09.773295 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/pdb_mmcif -> /mnt/template_mmcif_dir I0811 11:40:09.773350 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/pdb_mmcif -> /mnt/obsolete_pdbs_path I0811 11:40:09.773405 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/pdb70 -> /mnt/pdb70_database_path I0811 11:40:09.773465 140093178564032 run_singularity_container_shantanu.py:134] Binding /home/p_af2qe/monomer_af2_db/small_bfd -> /mnt/small_bfd_database_path I0811 11:40:09.773520 140093178564032 run_singularity_container_shantanu.py:243] Binding /home/p_af2qe/pilot_experiment/af2_dir_output/P02818_R94Q_af2_output -> /mnt/output I0811 11:40:09.773569 140093178564032 run_singularity_container_shantanu.py:247] Binding /scratch/tmp/slurm-1955090 -> /tmp /home/p_af2qe/af2_singularity/alphafold2/alphafold_2.3.2-1.sif
singularity run --nv --bind /home/p_af2qe/pilot_experiment:/mnt/fasta_path_0,/home/p_af2qe/monomer_af2_db/uniref90:/mnt/uniref90_database_path,/home/p_af2qe/monomer_af2_db/mgnify:/mnt/mgnify_database_path,/home/p_af2qe/monomer_af2_db:/mnt/data_dir,/home/p_af2qe/monomer_af2_db/pdb_mmcif:/mnt/template_mmcif_dir,/home/p_af2qe/monomer_af2_db/pdb_mmcif:/mnt/obsolete_pdbs_path,/home/p_af2qe/monomer_af2_db/pdb70:/mnt/pdb70_database_path,/home/p_af2qe/monomer_af2_db/small_bfd:/mnt/small_bfd_database_path,/home/p_af2qe/pilot_experiment/af2_dir_output/P02818_R94Q_af2_output:/mnt/output,/scratch/tmp/slurm-1955090:/tmp --env NVIDIA_VISIBLE_DEVICES=all --env TF_FORCE_UNIFIED_MEMORY=1 --env XLA_PYTHON_CLIENT_MEM_FRACTION=4.0 / home/p_af2qe/af2_singularity/alphafold2/alphafold_2.3.2-1.sif --fasta_paths=/mnt/fasta_path_0/P02818_R94Q.fasta --
uniref90_database_path=/mnt/uniref90_database_path/uniref90.fasta --mgnify_database_path=/mnt/mgnify_database_path/mgy_clusters_2022_05.fa --data_dir=/mnt/data_dir --template_mmcif_dir=/mnt/template_mmcif_dir/mmcif_files --obsolete_pdbs_path=/mnt/obsolete_pdbs_path/obsolete.dat --pdb70_database_path=/mnt/pdb70_database_path/pdb70 --small_bfd_database_path=/mnt/small_bfd_database_path/bfd-first_non_consensus_sequences.fasta --output_dir=/mnt/output --max_template_date=2023-06-14 --db_preset=reduced_dbs --model_preset=monomer --benchmark=False --use_precomputed_msas=False --num_multimer_predictions_per_model=5 --models_to_relax=best --use_gpu_relax=True --logtostderr INFO: Converting SIF file to temporary sandbox...
WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (335) bind mounts
I0811 11:40:42.019533 140598436288320 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0811 11:40:49.932570 140598436288320 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0811 11:40:50.601016 140598436288320 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA I0811 11:40:50.601294 140598436288320 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0811 11:40:50.601345 140598436288320 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I0811 11:40:58.049372 140598436288320 run_alphafold.py:424] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0811 11:40:58.049500 140598436288320 run_alphafold.py:438] Using random seed 293772752548622048 for the data pipeline I0811 11:40:58.049652 140598436288320 run_alphafold.py:185] Predicting P02818_R94Q
I0811 11:40:58.059766 140598436288320 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp89vvmnxm/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/
P02818_R94Q.fasta /mnt/uniref90_databasepath/uniref90.fasta" I0811 11:40:58.088092 140598436288320 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0811 11:46:54.855233 140598436288320 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 356.767 seconds I0811 11:46:54.861927 140598436288320 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpcbig0us/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/
P02818_R94Q.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0811 11:46:54.878192 140598436288320 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0811 11:57:49.916906 140598436288320 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 655.038 seconds I0811 11:57:49.936748 140598436288320 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmphd_4efsz/query.a3m -o /tmp/tmphd_4efsz/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I0811 11:57:49.954006 140598436288320 utils.py:36] Started HHsearch query I0811 11:57:50.089833 140598436288320 utils.py:40] Finished HHsearch query in 0.136 seconds Traceback (most recent call last): File "/app/alphafold/run_alphafold.py", line 468, in
app.run(main)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 443, in main
predict_structure(
File "/app/alphafold/run_alphafold.py", line 196, in predict_structure
feature_dict = data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 94, in query
raise RuntimeError(
RuntimeError: HHSearch failed:
stdout:
stderr:
INFO: Cleaning up image... INFO: Converting SIF file to temporary sandbox... WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (335) bind mounts I0811 11:40:42.019533 140598436288320 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat. I0811 11:40:49.932570 140598436288320 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: I0811 11:40:50.601016 140598436288320 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host Interpreter CUDA I0811 11:40:50.601294 140598436288320 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client' I0811 11:40:50.601345 140598436288320 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this. I0811 11:40:58.049372 140598436288320 run_alphafold.py:424] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0811 11:40:58.049500 140598436288320 run_alphafold.py:438] Using random seed 293772752548622048 for the data pipeline I0811 11:40:58.049652 140598436288320 run_alphafold.py:185] Predicting P02818_R94Q I0811 11:40:58.059766 140598436288320 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp89vvmnxm/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/P02818_R94Q.fasta /mnt/uniref90_databasepath/uniref90.fasta" I0811 11:40:58.088092 140598436288320 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0811 11:46:54.855233 140598436288320 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 356.767 seconds I0811 11:46:54.861927 140598436288320 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpcbig0us/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/P02818_R94Q.fasta /mnt/mgnify_database_path/mgy_clusters_2022_05.fa" I0811 11:46:54.878192 140598436288320 utils.py:36] Started Jackhmmer (mgy_clusters_2022_05.fa) query I0811 11:57:49.916906 140598436288320 utils.py:40] Finished Jackhmmer (mgy_clusters_2022_05.fa) query in 655.038 seconds I0811 11:57:49.936748 140598436288320 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmphd_4efsz/query.a3m -o /tmp/tmphd_4efsz/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70" I0811 11:57:49.954006 140598436288320 utils.py:36] Started HHsearch query I0811 11:57:50.089833 140598436288320 utils.py:40] Finished HHsearch query in 0.136 seconds Traceback (most recent call last): File "/app/alphafold/run_alphafold.py", line 468, in
app.run(main)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/app/alphafold/run_alphafold.py", line 443, in main
predict_structure(
File "/app/alphafold/run_alphafold.py", line 196, in predict_structure
feature_dict = data_pipeline.process(
File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process
pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m)
File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 94, in query
raise RuntimeError(
RuntimeError: HHSearch failed:
stdout:
stderr:
INFO: Cleaning up image...