mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

download-db.sh #4

Closed Thexiyang closed 3 years ago

Thexiyang commented 3 years ago

Thanks for the very nice tool! Is possible to give option database download, e.g. specify the storage file (not home directory) and use wget -c option? Is it possible download the database manually? The database is large and it may take quite some time to finish the download and it may fail during the downloading procedure, is it possible to have -resume option?

mtisza1 commented 3 years ago

Hi Thexiyang,

Thank you for your interest in Cenote-Taker 2, and thank you for the question.

Let me first say that I'm currently on parental leave and I won't have time to make and test large changes to the scripts for several weeks. Sorry about that.

In the interim, can you be a little more specific about what you are requesting? I'm assuming you are referring to the HHsuite databases, as they are large and slow to download. (Though the CDD database for RPS-BLAST also takes several minutes at 2.4GB).

Are you suggesting an option that will allow users to download and access these large databases in a directory that is not a subdirectory of the Cenote-Taker2 directory?

If you are just suggesting that I make a script to confirm the completion of the database downloads, and run a complete download command in the case that it failed/stopped, I could certainly implement this with wget -c.

It will take me a little longer to formalize this, but this should work for your purposes (choose the database(s) that bugged out for you):

cd Cenote-Taker2

## NCBI_CD for HHsuite
mkdir pfam_32_db && cd pfam_32_db
wget -c https://zenodo.org/record/3660537/files/NCBI_CD_hhsuite.tgz
tar -xvf NCBI_CD_hhsuite.tgz
rm NCBI_CD_hhsuite.tgz
cd ..

## pdb70_latest for HHsuite
mkdir pdb70 && cd pdb70
wget -c http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pdb70_from_mmcif_latest.tar.gz
tar -xvf pdb70_from_mmcif_latest.tar.gz
rm pdb70_from_mmcif_latest.tar.gz
cd ..

## pfam_32 for HHsuite
mkdir pfam_32_db && cd pfam_32_db
wget -c http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pfamA_32.0.tar.gz
tar -xvf pfamA_32.0.tar.gz
rm pfamA_32.0.tar.gz
cd ..

## cdd rpsblast db
mkdir cdd_rps_db && cd cdd_rps_db
wget -c ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/little_endian/Cdd_LE.tar.gz
tar -xvf Cdd_LE.tar.gz
rm Cdd_LE.tar.gz
cd ..

Please let me know if this was helpful, and good luck!

Mike

Thexiyang commented 3 years ago

Thanks! Exactly what I need. But it will be still good if the user can put the database at a specified file other than the Cenote-Taker2 directory. (‘’Are you suggesting an option that will allow users to download and access these large databases in a directory that is not a subdirectory of the Cenote-Taker2 directory?”). For example, we have put very limited space for the home directory.

mtisza1 commented 3 years ago

OK I'm glad this helps! My knowledge of HPC architecture is pretty minimal (and I'm sure different systems are different). I encourage users to install cenote-taker 2 in the /data/ directory (or equivalent) rather than /home/ directory because of the space requirements. That said, perhaps some users can't do this. I will add the option to store the large databases in a different directory in the next update (several weeks from now). Thanks again for your interest in the tool, and please let me know if you have any other questions/issues.

Best,

Mike