kalininalab / alphafold_non_docker

AlphaFold2 non-docker setup
338 stars 119 forks source link

RuntimeError: HHSearch failed: #37

Closed CristianoOliveira1 closed 1 year ago

CristianoOliveira1 commented 2 years ago

It seemed it was running.. until it wasnt.. although it produced some data meantime..

I was running it in an AWS instance with 60 cores, 477GiB RAM and 8 GPU

I have pasted the outputed logs below... Any idea what problem could be? Thank you

I runned it with this sample fasta file as query.fasta

T1050 A7LXT1, Bacteroides Ovatus, 779 residues| MASQSYLFKHLEVSDGLSNNSVNTIYKDRDGFMWFGTTTGLNRYDGYTFKIYQHAENEPGSLPDNYITDIVEMPDGRFWINTARGYVLFDKERDYFITDVTGFMKNLESWGVPEQVFVDREGNTWLSVAGEGCYRYKEGGKRLFFSYTEHSLPEYGVTQMAECSDGILLIYNTGLLVCLDRATLAIKWQSDEIKKYIPGGKTIELSLFVDRDNCIWAYSLMGIWAYDCGTKSWRTDLTGIWSSRPDVIIHAVAQDIEGRIWVGKDYDGIDVLEKETGKVTSLVAHDDNGRSLPHNTIYDLYADRDGVMWVGTYKKGVSYYSESIFKFNMYEWGDITCIEQADEDRLWLGTNDHGILLWNRSTGKAEPFWRDAEGQLPNPVVSMLKSKDGKLWVGTFNGGLYCMNGSQVRSYKEGTGNALASNNVWALVEDDKGRIWIASLGGGLQCLEPLSGTFETYTSNNSALLENNVTSLCWVDDNTLFFGTASQGVGTMDMRTREIKKIQGQSDSMKLSNDAVNHVYKDSRGLVWIATREGLNVYDTRRHMFLDLFPVVEAKGNFIAAITEDQERNMWVSTSRKVIRVTVASDGKGSYLFDSRAYNSEDGLQNCDFNQRSIKTLHNGIIAIGGLYGVNIFAPDHIRYNKMLPNVMFTGLSLFDEAVKVGQSYGGRVLIEKELNDVENVEFDYKQNIFSVSFASDNYNLPEKTQYMYKLEGFNNDWLTLPVGVHNVTFTNLAPGKYVLRVKAINSDGYVGIKEATLGIVVNPPFKLAAALQHHHHHH

The data generated:

ubuntu@run-62387ab63902662cbe274d7c-4d7kq:/mnt$ tree -sh /mnt/example/ /mnt/example/ ├── [4.0K] dummy_test │ └── [4.0K] query │ └── [4.0K] msas │ ├── [3.4M] mgnify_hits.sto │ └── [ 72M] uniref90hits.sto └── [ 830] query.fasta 3 directories, 3 files

ubuntu@run-62387ab63902662cbe274d7c-4d7kq:/app/alphafold$ sudo ./run_alphafold.sh -d /domino/datasets/af_download_data/ -o /mnt/example/dummy_test -f /mnt/example/query.fasta -t 2022-03-21 /opt/conda/lib/python3.7/site-packages/absl/flags/_validators.py:206: UserWarning: Flag --use_gpu_relax has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line! 'command line!' % flag_name) I0321 13:29:02.076551 139820712200000 templates.py:857] Using precomputed obsolete pdbs /domino/datasets/af_download_data//pdb_mmcif/obsolete.dat. I0321 13:29:03.170220 139820712200000 tpu_client.py:54] Starting the local TPU driver. I0321 13:29:03.171494 139820712200000 xla_bridge.py:212] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local:// I0321 13:29:05.166625 139820712200000 xla_bridge.py:212] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. I0321 13:29:21.223274 139820712200000 run_alphafold.py:384] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0'] I0321 13:29:21.223868 139820712200000 run_alphafold.py:400] Using random seed 1019557854010524627 for the data pipeline I0321 13:29:21.224538 139820712200000 run_alphafold.py:168] Predicting query I0321 13:29:21.225994 139820712200000 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpa75vmfip/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/example/query.fasta /domino/datasets/af_download_data//uniref90/uniref90.fasta" I0321 13:29:21.309449 139820712200000 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0321 13:37:13.182801 139820712200000 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 471.873 seconds I0321 13:37:19.727575 139820712200000 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpq_3sjpki/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/example/query.fasta /domino/datasets/af_download_data//mgnify/mgy_clusters_2018_12.fa" I0321 13:37:19.829512 139820712200000 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query I0321 13:44:58.966890 139820712200000 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 459.137 seconds I0321 13:45:22.831639 139820712200000 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpxw1gqa3o/query.a3m -o /tmp/tmpxw1gqa3o/output.hhr -maxseq 1000000 -d /domino/datasets/af_download_data//pdb70/pdb70" I0321 13:45:22.918177 139820712200000 utils.py:36] Started HHsearch query I0321 13:45:23.270786 139820712200000 utils.py:40] Finished HHsearch query in 0.352 seconds Traceback (most recent call last): File "/app/alphafold/run_alphafold.py", line 429, in app.run(main) File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/app/alphafold/run_alphafold.py", line 413, in main random_seed=random_seed) File "/app/alphafold/run_alphafold.py", line 181, in predict_structure msa_output_dir=msa_output_dir) File "/app/alphafold/alphafold/data/pipeline.py", line 188, in process pdb_templates_result = self.template_searcher.query(uniref90_msa_as_a3m) File "/app/alphafold/alphafold/data/tools/hhsearch.py", line 96, in query stdout.decode('utf-8'), stderr[:100_000].decode('utf-8'))) RuntimeError: HHSearch failed: stdout:

stderr:

kernyu commented 2 years ago

I also got this error while running alphafold on slurm. The steps leading up to hhsearch runs just fine and the resulting files don't seem too different from the previous successful runs, but error pops up upon applying slurm.

sanjaysrikakulam commented 2 years ago

Hi,

Sorry for the delayed response. Maybe SLURM nodes require additional memory (possibly virtual, see this HHBlits issue #12 ). Unfortunately, I do not have SLURM set up to test this or the money to invest in such a large AWS instance.