kalininalab / alphafold_non_docker

AlphaFold2 non-docker setup
338 stars 119 forks source link

HHblits failed #12

Closed barbarashih closed 2 years ago

barbarashih commented 3 years ago

Dear author, I have been getting an error in HHblits and I'm wondering if you might understand what is wrong. I tried to run the script with use_gpu=False (although I couldn't work out how this information has been passed on in the shell script). Here is the log.

I0816 11:44:49.701374 47500092145344 hhblits.py:128] Launching subprocess "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/
conda/alphafold/bin/hhblits -i /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/data/example_PB1F2/AFH41240.1.fasta -cpu 16
-oa3m /tmp/tmps6dyg_bz/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/me
mbers/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/mem
bers/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/databases/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0816 11:44:49.762680 47500092145344 utils.py:36] Started HHblits query
I0816 20:54:02.312248 47500092145344 utils.py:40] Finished HHblits query in 32952.549 seconds
E0816 20:54:02.322368 47500092145344 hhblits.py:138] HHblits failed. HHblits stderr begin:
E0816 20:54:02.322453 47500092145344 hhblits.py:141] - 11:45:38.035 INFO: Searching 65983866 column state sequences.
E0816 20:54:02.322493 47500092145344 hhblits.py:141] - 11:45:38.954 INFO: Searching 15161831 column state sequences.
E0816 20:54:02.322529 47500092145344 hhblits.py:141] - 11:45:39.035 INFO: /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/d
ata/example_PB1F2/AFH41240.1.fasta is in A2M, A3M or FASTA format
E0816 20:54:02.322567 47500092145344 hhblits.py:141] - 11:45:39.041 INFO: Iteration 1
E0816 20:54:02.322600 47500092145344 hhblits.py:141] - 11:45:39.072 INFO: Prefiltering database
E0816 20:54:02.322632 47500092145344 hhblits.py:141] - 19:15:38.345 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 378332
E0816 20:54:02.322664 47500092145344 hhblits.py:141] - 20:54:00.555 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 141755
E0816 20:54:02.322696 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 2000
E0816 20:54:02.322729 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000
E0816 20:54:02.322766 47500092145344 hhblits.py:141] - 20:54:00.751 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment
E0816 20:54:02.322798 47500092145344 hhblits.py:141] - 20:54:01.122 INFO: Alternative alignment: 0
E0816 20:54:02.322830 47500092145344 hhblits.py:142] HHblits stderr end
Traceback (most recent call last):
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 302, in <module>
    app.run(main)
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/conda/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/conda/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 276, in main
    predict_structure(
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/run_alphafold.py", line 126, in predict_structure
    feature_dict = data_pipeline.process(
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/alphafold/data/pipeline.py", line 178, in process
    hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query(
  File "/exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/alphafold/alphafold/data/tools/hhblits.py", line 143, in query
    raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % (
RuntimeError: HHblits failed
stdout:

stderr:
- 11:45:38.035 INFO: Searching 65983866 column state sequences.

- 11:45:38.954 INFO: Searching 15161831 column state sequences.

- 11:45:39.035 INFO: /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/data/example_PB1F2/AFH41240.1.fasta is in A2M, A3M or FASTA format

- 11:45:39.041 INFO: Iteration 1

- 11:45:39.072 INFO: Prefiltering database

- 19:15:38.345 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 378332

- 20:54:00.555 INFO: HMMs passed 1st prefilter (gapless profile-profile alignment)  : 141755

- 20:54:00.751 INFO: HMMs passed 2nd prefilter (gapped profile-profile alignment)   : 2000

- 20:54:00.751 INFO: HMMs passed 2nd prefilter and not found in previous iterations : 2000

- 20:54:00.751 INFO: Scoring 2000 HMMs using HMM-HMM Viterbi alignment

- 20:54:01.122 INFO: Alternative alignment: 0

Thank you for your help and for making your non-docker alphafold solution!

sanjaysrikakulam commented 3 years ago

Hi @barbarashih

Can you please share the command you used and also the compute resources in the computer you ran AF2?

barbarashih commented 3 years ago

I used

bash run_alphafold.sh \
 -d /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/databases \
 -o /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/analysis/alphafold \
 -m model_1 \
 -f /exports/cmvm/eddie/eb/groups/EEID_Mareks_IBV/members/roslin_bioinformatics/2021-07-23-_9707_EEID_AlphaFold_setup/data/example_PB1F2/AFH41240.1.fasta \
 -p full_dbs \
 -g false \
 -t "2021-08-11"

The amino acid is 91 in length. I used 16 threads for CPU, 32G per thread.

sanjaysrikakulam commented 3 years ago

Hi @barbarashih

Please check the solution in #14

barbarashih commented 2 years ago

Hi @sanjaysrikakulam Thank you for your response. I tried the solutions on the thread and unfortunately it didn't work. However, I eventually worked out that it seems hhblits requires a large amount of virtual memory (around 2 TB), even though it needs a lot less physical memory. The way our high performance computing was set up means it normally requires the physical memory to match the requested virtual memory. Our IT support resolved this by making a new node that accepts high amount of virtual memory while having low physical memory (I believe it can run with 256 GB physical memory) and I can now run alphafold 2 to completion. I was experiencing the same error when I used your non-docker version, the docker-converted singularity version, and hhblits ran as a stand-alone command using the bfd full database.

Hope the explanation is clear and might help others with the same problem.

sanjaysrikakulam commented 2 years ago

Hi @barbarashih

Great, glad to know! Thank you!