soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.4k stars 195 forks source link

Create GTDB DNA Taxonomy database in mmseq2 #884

Open feixiang1209 opened 2 months ago

feixiang1209 commented 2 months ago

I would like to create a GTDB DNA Taxonomy database in mmseq2. However, in the manual, it can only create the GTDB aminoacid databas. Could you please advise how I can create the DNA taxonomy database? Which files from GTDB should I download?

Thanks a lot

feixiang1209 commented 2 months ago

What i did was (1) Downloaded gtdb_genomes_reps.tar.gz and then combined all the fa files into one fa file. (2) downloaded ar53_taxonomy.tsv and bac120_taxonomy.tsv, also combined them into one tsv. (3) mmseqs createdb combined.fa gtdb_seqs (4) mmseqs createtaxdb gtdb_seqs tmp --tax-mapping-file combined.tsv --threads 50.

There was no error showed in the whole process. However, when I tried to use "mmseqs taxonomy" function to seach one contig.fa file, all the contigs were unclassified. Could you please advise where I did wrongly?

Thanks a lot