shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

how to use taxid-changelog #25

Closed Phylloxera closed 4 years ago

Phylloxera commented 4 years ago

Describe your issue

Hello,

I'm confused regarding the exact purpose/use case of taxid-changelog.

Would I want to run the following:

wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump_archive/taxdmp*.zip
ls taxdmp*.zip | rush -j 1 'unzip {} names.dmp nodes.dmp merged.dmp delnodes.dmp -d {@_(.+)\.}'
cd ..
taxonkit taxid-changelog -i archive -o taxid-changelog.csv.gz --verbose

to ensure I have the most up-to-date lineage for use with taxonkit list?

Thanks!

shenwei356 commented 4 years ago

https://github.com/shenwei356/taxid-changelog

Phylloxera commented 4 years ago

Sorry, but I read this and still don't understand. Maybe a better way to phrase my question is this: Is there anything that needs to be done periodically to ensure that one has the most up-to-date lineage for use with taxonkit list ? If so, can you be more explicit about what those things are and the rationale behind them (maybe as an addition to your very nice tutorial)? Or, alternatively, are the results of the taxon list command deterministic and independent of any changes that occur with the taxdmp* files? Thanks!

shenwei356 commented 4 years ago

ncbi taxonomy database (dump file) update nearly every day, the only way to keep up to date is to update ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz daily.

taxonkit taxid-changelog is a command for tracking changes between different taxdumps (monthly dumpped).

Phylloxera commented 4 years ago

The taxdump archive is not listed as a dependency for taxonkit so I'm guessing some fixed time point version of it is incorporated into the binary. Regardless, I'll go ahead and close this ticket and perhaps open a new one since my question has changed. Thanks.