Closed ryao-mdanderson closed 3 years ago
Hi @ryao-mdanderson,
Are you sure? During my test run, I did not get any error and the run_docker.py
from AF2 points to this path and everything seem to work.
Code from run_docker.py
# Path to the Uniclust30 database for use by HHblits.
uniclust30_database_path = os.path.join(
DOWNLOAD_DIR, 'uniclust30', 'uniclust30_2018_08', 'uniclust30_2018_08')
# Path to the PDB70 database for use by HHsearch.
pdb70_database_path = os.path.join(DOWNLOAD_DIR, 'pdb70', 'pdb70')
Hi @sanjaysrikakulam 👍 I haven't tested the non-docker version code on HPC cluster. when I review run_alphafold.sh, notice the database paths, e.g. I don't have $data_dir/pdb70/pdb70 in my download directory, instead, it is $data_dir/pdb70, so, I am confused and checking.
Thank you! Rong
Hi @ryao-mdanderson,
Please let me know if you get an error or find out something is not working when you test it. Also, the bash script follows the docker run python script of AF2.
I'm a little confused as well. It seems to match what is in run_docker.sh
, but I get this error when running run_alphafold.sh
ValueError: Could not find HHBlits database /reference/AlphaFold/uniclust30/uniclust30_2018_08/uniclust30_2018_08
When I check the download there doesn't seem to be a uniclust30_2018_08
directory
> ls /reference/AlphaFold/uniclust30
uniclust30_2018_08_a3m_db.index uniclust30_2018_08.cs219 uniclust30_2018_08.cs219.sizes uniclust30_2018_08_hhm.ffindex
uniclust30_2018_08_a3m.ffdata uniclust30_2018_08_cs219.ffdata uniclust30_2018_08_hhm_db.index uniclust30_2018_08_md5sum
uniclust30_2018_08_a3m.ffindex uniclust30_2018_08_cs219.ffindex uniclust30_2018_08_hhm.ffdata
Hi @dldereklee
This is the AF2's directory structure,
$DOWNLOAD_DIR/ # Total: ~ 2.2 TB (download: 438 GB)
bfd/ # ~ 1.7 TB (download: 271.6 GB)
# 6 files.
mgnify/ # ~ 64 GB (download: 32.9 GB)
mgy_clusters_2018_12.fa
params/ # ~ 3.5 GB (download: 3.5 GB)
# 5 CASP14 models,
# 5 pTM models,
# LICENSE,
# = 11 files.
pdb70/ # ~ 56 GB (download: 19.5 GB)
# 9 files.
pdb_mmcif/ # ~ 206 GB (download: 46 GB)
mmcif_files/
# About 180,000 .cif files.
obsolete.dat
small_fbd/ # ~ 17 GB (download: 9.6 GB)
bfd-first_non_consensus_sequences.fasta
uniclust30/ # ~ 86 GB (download: 24.9 GB)
uniclust30_2018_08/
# 13 files.
uniref90/ # ~ 58 GB (download: 29.7 GB)
uniref90.fasta
I am not sure how you have downloaded your data and why it is in a different directory structure. You can update the paths in the bash script (run_alphafold.sh
) if your directory structure does not match AF2's directory structure.
Hi @sanjaysrikakulam
The download structure is really helpful. I just realize I don't have small_fbd directory downloaded.
Thanks!
Hi @sanjaysrikakulam 👍
I reviewed scripts directory, which have all the download sh script. The download_all_data.sh does not have code to download small_fbd directory. May I know how do you have this directory downloaded? How do I can get bfd-first_non_consensus_sequences.fasta?
Thanks!
Hi @ryao-mdanderson
It looks like that download_all_data.sh
has a conditional based download
if [[ "${DOWNLOAD_MODE}" = full_dbs ]] ; then
echo "Downloading BFD..."
bash "${SCRIPT_DIR}/download_bfd.sh" "${DOWNLOAD_DIR}"
else
echo "Downloading Small BFD..."
bash "${SCRIPT_DIR}/download_small_bfd.sh" "${DOWNLOAD_DIR}"
fi
I manually downloaded all the data using wget and rsync. I did not use AF2 download scripts.
@sanjaysrikakulam Thank you very much. I see. I git clone the directory on July 19, so that script/download_all_data.sh does not have this if - else condition and no download_small_bfd.sh. I re git clone a new version.
@ryao-mdanderson
I think you don't need the small bfd if you download the bfd database.
@sanjaysrikakulam I am sorry for bother you again
bash run_alphafold.sh -d ./alphafold_data/ -o ./dummy_test/ -m model_1 -f ./example/query.fasta -t 2020-05-14
May I know what is alphafold_data (-d flag) refer to in this example? Thanks!
Hi @ryao-mdanderson
Its the download directory where you have all the AF2 required databases.
Hello author, In file run_alphafold.sh, database path section:
should pdb70_database_path="$data_dir/pdb70/pdb70" uniclust30_database_path="$data_dir/uniclust30/uniclust30_2018_08/uniclust30_2018_08"
be: pdb70_database_path="$data_dir/pdb70" uniclust30_database_path="$data_dir/uniclust30/uniclust30_2018_08"
Thank you! Rong