shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
357 stars 29 forks source link

Exclude nodes from the taxonomy #98

Open fgvieira opened 1 month ago

fgvieira commented 1 month ago

Is it possible to remove a node (or a list of nodes) and (optionally) all those downstream? I am trying to remove all branches of the taxonomy that are unclassified (e.g. 12908).

Something like (when #93 is done):

echo unclassified | taxonkit name2taxid | taxonkit list | taxonkit filter --exclude | taxonkit create-taxdump
shenwei356 commented 1 month ago

How about directly editing the nodes.dmp file.

  1. get a list of ids with taxonkit list or other ways.
  2. csvtk grep -Ht -f 1 -v -P list.txt nodes.dmp > nodes.new
shenwei356 commented 3 weeks ago

How's it going?