mtisza1 / Cenote-Taker3

Discover and annotate the virome
MIT License
32 stars 1 forks source link

problems with downloading databases #14

Closed F4NG666 closed 3 months ago

F4NG666 commented 3 months ago

Hi,

I encountered an issue while trying to download the databases because the connection to Zenodo was refused. The link to Zenodo, https://zenodo.org/records/10840546/, no longer exists, which prevents the databases from being downloaded automatically. I also attempted to download the files manually but am struggling to place them in the correct directories. As I am new to this, I would greatly appreciate any guidance or assistance you can provide.

Thank you!

Input ct3_DBs/mmseqs_DBs/refseq_virus_prot.fasta does not exist createtaxdb ct3_DBs/mmseqs_DBs/refseq_virus_prot_taxDB ct3_DBs/mmseqs_DBs/tmp --tax-mapping-file ct3_DBs/mmseqs_DBs/refseq_virus_prot_taxids.mmseqs_fmt.tsv

MMseqs Version: 15.6f452 NCBI tax dump directory Taxonomy mapping file ct3_DBs/mmseqs_DBs/refseq_virus_prot_taxids.mmseqs_fmt.tsv Taxonomy mapping mode 0 Taxonomy db mode 1 Threads 256 Verbosity 3

Input ct3_DBs/mmseqs_DBs/refseq_virus_prot_taxDB does not exist running mmseqs hallmark taxdb database update/install --2024-08-20 22:11:48-- https://zenodo.org/records/10840546/files/ct3_hallmark_nr_cd90_refseq.faa.gz Resolving zenodo.org (zenodo.org)... 0.0.0.0, :: Connecting to zenodo.org (zenodo.org)|0.0.0.0|:443... failed: Connection refused. Connecting to zenodo.org (zenodo.org)|::|:443... failed: Connection refused. gzip: ct3_DBs/mmseqs_DBs/ct3_hallmark_nr_cd90_refseq.faa.gz: No such file or directory --2024-08-20 22:11:48-- https://zenodo.org/records/10840546/files/ct3_hallmark_nr_cd90_refseq.prot_taxids.mmseqs_fmt.tsv Resolving zenodo.org (zenodo.org)... 0.0.0.0, :: Connecting to zenodo.org (zenodo.org)|0.0.0.0|:443... failed: Connection refused. Connecting to zenodo.org (zenodo.org)|::|:443... failed: Connection refused. createdb ct3_DBs/mmseqs_DBs/ct3_hallmark_nr_cd90_refseq.faa ct3_DBs/mmseqs_DBs/ct3_hallmark.taxDB

MMseqs Version: 15.6f452 Database type 0 Shuffle input database true Createdb mode 0 Write lookup file 1 Offset of numeric ids 0 Compressed 0 Verbosity 3

mtisza1 commented 3 months ago

Hi,

Just to give a quick answer here, your IP address is coming up as 0.0.0.0, which isn't valid, so Zenodo is blocking you.

I think you'll want to work with your IT professional or sys admin to resolve this.

This is the database structure, generally:

{ct3_dbs}/
|   viral_cdds_and_pfams_191028.txt       <- accession list
└───hmmscan_DBs/v3.1.1/                  <- hmmscan files (.h3m)
└───mmseqs_DBs/
              |   CDD*                         <- mmseqs DB
              |   refseq_virus_prot_taxDB*         <- mmseqs DB
              |   ct3_hallmark.taxDB*                <- mmseqs DB

Let me know if this helps resolve your issue.

Mike

F4NG666 commented 3 months ago

thanks for your help! but I still have some questions, I manually downloaded all the files from zenodo and put them in the directories as shown below.

(ct3_env) t120554@starvk10-server:~/Cenote-Taker3/ct3_DBs/mmseqs_DBs$ ls -lh total 424M -rw-rw-r-- 1 t120554 t120554 33M 8月21日 20:24 cddid_all.tbl -rw-rw-r-- 1 t120554 t120554 164M 8月21日 20:24 ct3_hallmark_nr_cd90_refseq.faa -rw-rw-r-- 1 t120554 t120554 3.5M 8月21日 20:24 ct3_hallmark_nr_cd90_refseq.prot_taxids.mmseqs_fmt.tsv -rw-rw-r-- 1 t120554 t120554 209M 8月21日 20:25 refseq_virus_prot.fasta -rw-rw-r-- 1 t120554 t120554 15M 8月21日 20:25 refseq_virus_prot_taxids.mmseqs_fmt.tsv drwxrwxr-x 3 t120554 t120554 4.0K 8月21日 20:01 tmp

(ct3_env) t120554@starvk10-server:~/Cenote-Taker3/ct3_DBs/hmmscan_DBs/v3.1.1$ ls -lh total 1.3G -rw-r--r-- 1 t120554 t120554 24M 2024年 1月11日 DNA_rep_HMMs.h3m -rw-r--r-- 1 t120554 t120554 40M 2024年 1月11日 RDRP_HMMs.h3m -rw-r--r-- 1 t120554 t120554 538M 2024年 1月11日 Useful_Annotation_HMMs.h3m -rw-r--r-- 1 t120554 t120554 377M 2024年 1月11日 Virion_HMMs.h3m -rw-r--r-- 1 t120554 t120554 271M 2024年 1月11日 phrogs_for_ct.h3m

however it still requires me some .taxDB files , which I don't know where to download them or turn somethings into them. FASTA checked.

mmseqs tax db file is not found at Cenote-Taker3/ct3_DBs/mmseqs_DBs/ct3_hallmark.taxDB

mtisza1 commented 3 months ago

Sure, so the mmseqs2 databases need to be set up on your machine. It's important to do it this way because you need to run the same version of mmseqs2 for queries that was used to install the databases. Briefly, do this in the mmseqs_DBs directory:

mmseqs createdb refseq_virus_prot.fasta refseq_virus_prot_taxDB

mmseqs createtaxdb refseq_virus_prot_taxDB tmp --tax-mapping-file refseq_virus_prot_taxids.mmseqs_fmt.tsv
mmseqs createdb ct3_hallmark_nr_cd90_refseq.faa ct3_hallmark.taxDB

mmseqs createtaxdb ct3_hallmark.taxDB tmp --tax-mapping-file ct3_hallmark_nr_cd90_refseq.prot_taxids.mmseqs_fmt.tsv
mmseqs databases CDD CDD tmp