shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
369 stars 29 forks source link

Ranks of interest for taxonkit lineage #20

Closed johanneswerner closed 5 years ago

johanneswerner commented 5 years ago

I am working mostly with bacteria, and especially with uncultured bacteria many ranks appear that have the taxonomy rank no rank. Is it possible to define the ranks that I want to get back? I am thinking of an option such as

--filter-ranks superkingdom,phylum,class,order,family,genus

Please let me know if my issue description is clear enough.

shenwei356 commented 5 years ago

It has an option to append rank column to result.

  -r, --show-rank             show rank of taxids

Example:

$ echo -ne "11932\n34101\n" | taxonkit lineage --show-rank
11932   Viruses;Ortervirales;Retroviridae;unclassified Retroviridae;Intracisternal A-particles;Mouse Intracisternal A-particle  species
34101   cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Planococcaceae;Filibacter;Filibacter limicola     species

You can filter results according to the rank column. e.g.,

taxonkit lineage --show-rank | grep "no rank" -v 
johanneswerner commented 5 years ago

Hi and thank you for your answer.

What I am looking for are only specific ranks (superkingdom, phylum, class, order, family, genus and species) to subsequently create a table with taxid and the highest taxid and all descending taxa. Do you also have an idea on how to deal with that?

For instance, in the upper example, Terrabacteria group should not appear. (but if one rank does not exist, it should have an empty field or something like <not assigned>.

For instance, the taxon id 1612341 (Roholtiella) has no class level, but in the end I would like to end up with a table where each taxon rank is in the correct column.

Thank you for your assistance.

shenwei356 commented 5 years ago

What I am looking for are only specific ranks (superkingdom, phylum, class, order, family, genus and species) to subsequently create a table with taxid and the highest taxid and all descending taxa. Do you also have an idea on how to deal with that?

Try combining taxonkit list --show-rank --indent "" and taxonkit lineage --show-rank?

For instance, in the upper example, Terrabacteria group should not appear. (but if one rank does not exist, it should have an empty field or something like . For instance, the taxon id 1612341 (Roholtiella) has no class level, but in the end I would like to end up with a table where each taxon rank is in the correct column.

Is taxonkit reformat what you need?

johanneswerner commented 5 years ago

Is taxonkit reformat what you need?

Perfect, thanks, solved all my problems. :-)