Closed Lucas-Maciel closed 1 year ago
https://github.com/shenwei356/gtdb-taxdump#taxonomic-hierarchy
A GTDB species cluster contains >=1 assemblies, each can be treated as a strain. So we can assign each assembly a TaxId with the rank of "no rank" below the species rank. Therefore, we can also track the changes of these assemblies via the TaxId later.
Don't worry the "no rank" which is below the species rank, so it belongs to "subspecies".
609216830 superkingdom Bacteria
947989846 phylum Firmicutes_A
1797966051 class Clostridia
1853814285 order Lachnospirales
3217231047 family Lachnospiraceae
1880979389 genus COE1
2414110737 species COE1 sp002358575
2538223356 no rank MGG00015
Do you have any tips on how to safely integrate this taxdump file with the one provided by NCBI? I want for example to use this custom GTDB taxdump together with the NCBI viral and fungi database from Kraken2. But I'm worried about the conflicts between taxid numbers.
It's a great idea. I think my taxonomic profiling tool KMCP should also use this combined taxonomy. Previsouly, we use the NCBI taxonomy for reference genomes from GTDB and Refseq.
To achieve this, you need to create taxdump files with both the GTDB lineages and NCBI lineages of the viral and fungi in one run.
taxonkit list | taxonkit reformat
.taxonkit create-taxdump --gtdb
and taxonkit list | taxonkit reformat
.taxonkit create-taxdump
.I'll add the steps to the tutorial, maybe next week (We're on holiday).
@shenwei356 thank you for your reply.
I'll try your instructions and check the KMCP as well.
Best,Lucas
I've added some tutorials on Merging GTDB and NCBI taxonomy, which could help.
Hi,
I'm trying to use taxonkit create-taxdump but I have two questions:
1) I'm using the following command but all my "accession" names are being assigned as "no rank " instead of subspecies. Am I missing something?
My input has the following format
2) Do you have any tips on how to safely integrate this taxdump file with the one provided by NCBI? I want for example to use this custom GTDB taxdump together with the NCBI viral and fungi database from Kraken2. But I'm worried about the conflicts between taxid numbers.
Thank you for your time Kind regards