shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
357 stars 29 forks source link

[Question] Graphical representation of taxdumps #95

Open fgvieira opened 3 months ago

fgvieira commented 3 months ago

Is it possible to generate a graphical representation of a taxdump?

Most taxdumps are huge and impractical to represent graphically but, sometimes when testing, users work with reduced taxdumps that could be more amenable to a graphical representation. For example, something that would just convert taxdump to dot format and then to SVG:

awk -F"\t" 'BEGIN{print "digraph G {"} NR==FNR{print $3" -> "$1";"; labels[$1]=$1"\n"$5} NR!=FNR{print $1" [label=\""$3"\n"labels[$1]"\"];"} END{print "}"}' nodes.dmp names.dmp | dot -Tsvg > taxdump.svg

Alternatively, and to be able to work with the full taxdump, one could also think of an interactive html document similar to the one in Taxallnomy.

shenwei356 commented 3 months ago

The network (tree) is too big to show.

fgvieira commented 3 months ago

Yes, if we try to represent everything, but the Taxallnomy link I linked above just represents parts of the tree at a time.