kalininalab / alphafold_non_docker

AlphaFold2 non-docker setup
346 stars 120 forks source link

RuntimeError: HHblits failed #9

Closed Wanghair closed 1 year ago

Wanghair commented 3 years ago

Dear author: I followed the steps to configure the environment(CPU), but at the end run_alphafold.sh reported an error:

/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/flags/_validators.py:203: UserWarning: Flag --preset has a non-None default value; therefore, mark_flag_as_required will pass even if flag is not specified in the command line! warnings.warn( I0811 03:41:17.217879 140270426367808 templates.py:837] Using precomputed obsolete pdbs ./DOWNLOAD_DIR/pdb_mmcif/obsolete.dat. I0811 03:41:18.356215 140270426367808 tpu_client.py:54] Starting the local TPU driver. I0811 03:41:18.357059 140270426367808 xla_bridge.py:214] Unable to initialize backend 'tpu_driver': Not found: Unable to find driver in registry given worker: local:// 2021-08-11 03:41:18.358851: W external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2021-08-11 03:41:18.358934: W external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) I0811 03:41:18.359137 140270426367808 xla_bridge.py:214] Unable to initialize backend 'gpu': Failed precondition: No visible GPU devices. I0811 03:41:18.359338 140270426367808 xla_bridge.py:214] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. W0811 03:41:18.359463 140270426367808 xla_bridge.py:217] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.) I0811 03:41:19.314316 140270426367808 run_alphafold.py:260] Have 1 models: ['model_1'] I0811 03:41:19.314738 140270426367808 run_alphafold.py:273] Using random seed 4975129475860990710 for the data pipeline I0811 03:41:19.336503 140270426367808 jackhmmer.py:130] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp1s6fhn/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./example/query.fasta ./DOWNLOAD_DIR/uniref90/uniref90.fasta" I0811 03:41:19.413159 140270426367808 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0811 03:50:05.036837 140270426367808 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 525.623 seconds I0811 03:50:05.040448 140270426367808 jackhmmer.py:130] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpqrhjvvgw/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 ./example/query.fasta ./DOWNLOAD_DIR/mgnify/mgy_clusters.fa" I0811 03:50:05.241362 140270426367808 utils.py:36] Started Jackhmmer (mgy_clusters.fa) query I0811 04:01:40.166194 140270426367808 utils.py:40] Finished Jackhmmer (mgy_clusters.fa) query in 694.879 seconds I0811 04:01:40.621608 140270426367808 hhsearch.py:76] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/hhsearch -i /tmp/tmp996yhgxj/query.a3m -o /tmp/tmp996yhgxj/output.hhr -maxseq 1000000 -d ./DOWNLOAD_DIR/pdb70/pdb70" I0811 04:01:40.838742 140270426367808 utils.py:36] Started HHsearch query I0811 04:12:59.336633 140270426367808 utils.py:40] Finished HHsearch query in 678.436 seconds I0811 04:12:59.917971 140270426367808 hhblits.py:128] Launching subprocess "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/hhblits -i ./example/query.fasta -cpu 4 -oa3m /tmp/tmpyemalf6z/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d ./DOWNLOAD_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d ./DOWNLOAD_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08" I0811 04:13:00.089679 140270426367808 utils.py:36] Started HHblits query I0811 04:13:23.778954 140270426367808 utils.py:40] Finished HHblits query in 23.689 seconds E0811 04:13:23.779619 140270426367808 hhblits.py:138] HHblits failed. HHblits stderr begin: E0811 04:13:23.779794 140270426367808 hhblits.py:141] - 04:13:23.681 ERROR: Could find neither hhm_db nor a3m_db! E0811 04:13:23.779950 140270426367808 hhblits.py:142] HHblits stderr end Traceback (most recent call last): File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 303, in app.run(main) File "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 277, in main predict_structure( File "/lustre/user/lulab/gaojd/whr/alphafold/run_alphafold.py", line 127, in predict_structure feature_dict = data_pipeline.process( File "/lustre/user/lulab/gaojd/whr/alphafold/alphafold/data/pipeline.py", line 170, in process hhblits_bfd_uniclust_result = self.hhblits_bfd_uniclust_runner.query( File "/lustre/user/lulab/gaojd/whr/alphafold/alphafold/data/tools/hhblits.py", line 143, in query raise RuntimeError('HHblits failed\nstdout:\n%s\n\nstderr:\n%s\n' % ( RuntimeError: HHblits failed stdout:

stderr:

I don't know what caused this result, so I want some help. Thanks!

ryao-mdanderson commented 3 years ago

@Wanghair hello,

Your run is one step ahead of mine. The error message is looking for hhm_db or a3m_db; I haven't reached out this step, so I can't say. Are these supposed in downloaded database? I do not find them.

Your run hit the error at git clone path : alphafold/data/tools/hhblits.py line 141, so, line 134 retcode has a non zero value; My run failed at line 133, stdout, stderr = process.communicate().

I appreciate if @sanjaysrikakulam has any suggestion.

Thanks!

Wanghair commented 3 years ago

@ryao-mdanderson hello, Thank you for your suggestions. I think hhm_db or a3m_db is related to pdb70 databases. But I still can't find the problem. It may take some time. Thanks!

sanjaysrikakulam commented 3 years ago

Hi @Wanghair

Can you provide me with the folder contents of the uniclust30/uniclust30_2018_08/. Here, HHblits seems to use bfd and uniclust30 databases.

"/lustre/user/lulab/gaojd/whr/software/miniconda3/envs/alphafold/bin/hhblits -i ./example/query.fasta -cpu 4 -oa3m /tmp/tmpyemalf6z/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d ./DOWNLOAD_DIR/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d ./DOWNLOAD_DIR/uniclust30/uniclust30_2018_08/uniclust30_2018_08"

AF2 required folder structure is this.

$DOWNLOAD_DIR/                             # Total: ~ 2.2 TB (download: 438 GB)
    bfd/                                   # ~ 1.7 TB (download: 271.6 GB)
        # 6 files.
    mgnify/                                # ~ 64 GB (download: 32.9 GB)
        mgy_clusters_2018_12.fa
    params/                                # ~ 3.5 GB (download: 3.5 GB)
        # 5 CASP14 models,
        # 5 pTM models,
        # LICENSE,
        # = 11 files.
    pdb70/                                 # ~ 56 GB (download: 19.5 GB)
        # 9 files.
    pdb_mmcif/                             # ~ 206 GB (download: 46 GB)
        mmcif_files/
            # About 180,000 .cif files.
        obsolete.dat
    small_bfd/                             # ~ 17 GB (download: 9.6 GB)
        bfd-first_non_consensus_sequences.fasta
    uniclust30/                            # ~ 86 GB (download: 24.9 GB)
        uniclust30_2018_08/
            # 13 files.
    uniref90/                              # ~ 58 GB (download: 29.7 GB)
        uniref90.fasta

This is how my uniclust30 folder looks like and it contains the a3m and hhm db's.

uniclust30
└── uniclust30_2018_08
    ├── uniclust30_2018_08_a3m_db -> uniclust30_2018_08_a3m.ffdata
    ├── uniclust30_2018_08_a3m_db.index
    ├── uniclust30_2018_08_a3m.ffdata
    ├── uniclust30_2018_08_a3m.ffindex
    ├── uniclust30_2018_08.cs219
    ├── uniclust30_2018_08_cs219.ffdata
    ├── uniclust30_2018_08_cs219.ffindex
    ├── uniclust30_2018_08.cs219.sizes
    ├── uniclust30_2018_08_hhm_db -> uniclust30_2018_08_hhm.ffdata
    ├── uniclust30_2018_08_hhm_db.index
    ├── uniclust30_2018_08_hhm.ffdata
    ├── uniclust30_2018_08_hhm.ffindex
    └── uniclust30_2018_08_md5sum

1 directory, 13 files

Please fix your data/download directory and make sure you have the required databases in the uniclust30 directory then this problem should get fixed.

Wanghair commented 3 years ago

hello,@sanjaysrikakulam My folder contents of the uniclust30/uniclust30_2018_08/ is as follows: uniclust30 └── uniclust30_2018_08 ├── uniclust30_2018_08_a3m_db ├── uniclust30_2018_08_a3m_db.index ├── uniclust30_2018_08_a3m.ffdata ├── uniclust30_2018_08_a3m.ffindex ├── uniclust30_2018_08.cs219 ├── uniclust30_2018_08_cs219.ffdata ├── uniclust30_2018_08_cs219.ffindex ├── uniclust30_2018_08.cs219.sizes ├── uniclust30_2018_08_hhm_db ├── uniclust30_2018_08_hhm_db.index ├── uniclust30_2018_08_hhm.ffdata ├── uniclust30_2018_08_hhm.ffindex └── uniclust30_2018_08_md5sum It seems the same as your directory. Or I need to perform some operations on the files in this directory? thanks!

sanjaysrikakulam commented 3 years ago

Hi @Wanghair

The files are on the top directory uniclust30 but if you check mine it is inside the sub-driectory uniclust30_2018_08

uniclust30                                                                <----- Parent directory
└── uniclust30_2018_08                                         <----- sub-driectory
    ├── uniclust30_2018_08_a3m_db -> uniclust30_2018_08_a3m.ffdata
    ├── uniclust30_2018_08_a3m_db.index
    ├── uniclust30_2018_08_a3m.ffdata
    ├── uniclust30_2018_08_a3m.ffindex
    ├── uniclust30_2018_08.cs219
    ├── uniclust30_2018_08_cs219.ffdata
    ├── uniclust30_2018_08_cs219.ffindex
    ├── uniclust30_2018_08.cs219.sizes
    ├── uniclust30_2018_08_hhm_db -> uniclust30_2018_08_hhm.ffdata
    ├── uniclust30_2018_08_hhm_db.index
    ├── uniclust30_2018_08_hhm.ffdata
    ├── uniclust30_2018_08_hhm.ffindex
    └── uniclust30_2018_08_md5sum

1 directory, 13 files
Wanghair commented 3 years ago

Hello @sanjaysrikakulam I used the download_all_data.sh to download databases. I checked the directory structure of my database again, and my directory structure is the same as yours. But there are always errors in the running results. Thanks!

sanjaysrikakulam commented 3 years ago

Hello @Wanghair

I am unable to reproduce your error. Also, please use the absolute path instead of the relative path. At the moment I can't think of why AF2 is failing. Can you please use the latest version of our non-docker bash script and the latest version of AF2.

Wanghair commented 3 years ago

Hi @sanjaysrikakulam Thanks for your suggestions very very much. I will look for the causes of my error. Thanks!

ryao-mdanderson commented 3 years ago

@sanjaysrikakulam @Wanghair 👍 I have successfully run through the test case. @Wanghair Besides @sanjaysrikakulam 's suggest for using the latest version, please also check if you have 2 symbolic links set up under uniclust30_2018_08/ lrwxrwxrwx 1 528745 9100 29 Oct 11 2018 uniclust30_2018_08_a3m_db -> uniclust30_2018_08_a3m.ffdata lrwxrwxrwx 1 528745 9100 29 Oct 11 2018 uniclust30_2018_08_hhm_db -> uniclust30_2018_08_hhm.ffdata

Wanghair commented 3 years ago

@ryao-mdanderson, Hello Firstly thank you for your suggestions. I have tried many ways to solve the problem. But it still reports an error, I don't know why.

Wanghair commented 3 years ago

@sanjaysrikakulam hello Sorry to disturb you again. But I still have some questions. My run failed at line 134(alphafold/data/tools/hhblits.py): retcode = process.wait(). Is the cause of the error due to process blocking? Does this have anything to do with my CPU version? Thanks!

ryao-mdanderson commented 3 years ago

@Wanghair I notice in alphafold/data/tools, hhblits.py require 4cpu cores, hmmsearch.py and jackhmmer.py require 8 cpu cores. You might check the number of cores available on your system.

sanjaysrikakulam commented 3 years ago

Hi @Wanghair

Can you please delete the bfd and uniclust30 databases and redownload them and try once again. Many users seem to have trouble with HHblits see #14

wsatbluesky commented 3 years ago

Hi @sanjaysrikakulam I can use root run test without error. But when i switch to common user,the test has same error like @Wanghair

sanjaysrikakulam commented 3 years ago

@wsatbluesky Might be due to the folder permissions or user permissions. Please check #14

Lsz-20 commented 2 years ago

@sanjaysrikakulam hello Sorry to disturb you again. But I still have some questions. My run failed at line 134(alphafold/data/tools/hhblits.py): retcode = process.wait(). Is the cause of the error due to process blocking? Does this have anything to do with my CPU version? Thanks!

Hello,Have you solved this problem?I have the same question as yours~ image

Lsz-20 commented 2 years ago

@sanjaysrikakulam Hello,perhaps I have the same question~.I have tried 'monomer' and it works well , but the 'multimer ' show me this image Thanks!

sanjaysrikakulam commented 2 years ago

Hi @Lsz-20

I was able to find some threads that discuss similar problems, please check

https://github.hode.uk/soedinglab/hh-suite/issues/277 https://github.hode.uk/soedinglab/hh-suite/issues/279

sanjaysrikakulam commented 2 years ago

Also check this as well https://github.com/kalininalab/alphafold_non_docker/issues/14

Lsz-20 commented 2 years ago

Also check this as well #14

Oh~ Thanks for your help! I'll try first