Open PeterCx opened 1 year ago
Sure. This tutorial could be a reference.
Steps:
Exporting taxonomic lineages of taxa with rank equal to species from GTDB-taxdump, into tabular format.
taxonkit list --data-dir gtdb-taxdump/R207/ --ids 1 --indent "" \
| taxonkit filter --data-dir gtdb-taxdump/R207/ --equal-to species \
| taxonkit reformat --data-dir gtdb-taxdump/R207/ --taxid-field 1 \
--format "{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}" \
-o gtdb.tsv
For the new MAGs, you need to prepare the full lineages in 7-column tabular format. You may create new species names.
# custom.tsv
$ cat custom.tsv
A B C D E F G
Creating taxdump from lineages above.
(cut -f 2- gtdb.tsv; cat custom.tsv) \
| taxonkit create-taxdump \
-R "superkingdom,phylum,class,order,family,genus,species" \
-O taxdump
Some tests.
$ echo G | taxonkit name2taxid --data-dir taxdump/
G 1630414510
$ echo 1630414510 | taxonkit lineage --data-dir taxdump/ -r
1630414510 A;B;C;D;E;F;G species
Thank you very much. This has worked. In that I have been able to generate TaxID for my MAGs.
However I think something is incorrect. I was using a custom taxdump that was generated previously. I was assuming this taxdump would be updated to include my MAGs. Instead a different taxdump is generated which is much smaller in size, although it does contain my MAGs. See the files below. Not sure what the problem is.
Old_TaxDump_Names.dmp.txt New_TaxDump_Names.dmp.txt
Your help is appreciated.
Kind regards,
P
So how was the old one generated?
It was downloaded from here:
http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/taxdump/
I am using another tool called Struo2 to generate a Kraken2 GTDB database and then update it with my custom MAGs. The creator of this tool says he produced this specific taxdump using your tool.
I see, the taxdump files in struo2
should be generated by @nick-youngblut with old versions of TaxonKit, which might produce different TaxId values for the same lineage.
I'm not sure whether @nick-youngblut performed other transforms cause TaxonKit began to save taxIds in int32
instead of uint32
since v0.14.0 (Nov 28, 2022), as BLAST and DIAMOND do since v0.14.0 (Nov 28, 2022).
Or you can also try v0.12.0, which should be the version he used, and regenerate new taxdump files.
I'm not sure whether @nick-youngblut performed other transforms cause TaxonKit began to save taxIds in int32 instead of uint32 since v0.14.0 (Nov 28, 2022), as BLAST and DIAMOND do since v0.14.0 (Nov 28, 2022).
I did not conduct any transformations. I used the taxdump as-is for GTDB-r207. The taxdump files were downloaded in June 2022.
Hi there,
I have MAGs is generated from my own samples and annotated using GTDB-Tk. They have been de-replicated leaving me with a unique set which are not genetically close to any reference genome in GTDBr207. Can I use this tool to get taxIDs for my MAGs? I want to build a custom database with my MAGs and GTDB and for this I require taxIDs.
Its not clear to me if this is possible using this tool. Your help is greatly appreciated.
Kind regards,
P