Closed poursalavati closed 1 year ago
clade
could be anywhere in the taxonomic tree, so it's not specific.
Thanks, In NCBI format, its the 2nd rank, for example in this virus lineage:
GCA_000320725.1 Varidnaviria Bamfordvirae Nucleocytoviricota Megaviricetes Imitervirales Mimiviridae unclassified Mimiviridae genus Acanthamoeba polyphaga lentillevirus
Varidnaviria could not be exported when using:
./taxonkit reformat -I 3 --data-dir ../taxdump/ joined -F -f "{K}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}"
current output:
GCA_000320725.1 Bamfordvirae Nucleocytoviricota Megaviricetes Imitervirales Mimiviridae unclassified Mimiviridae genus Acanthamoeba polyphaga lentillevirus
Im looking for a placeholder that could extract Varidnaviria (clade rank).
I see, it seems that there always is a clade
between superkingdom
and kingdom
. However, there are also thousands of taxa having two clade
, which makes the clade
ambiguous.
$ echo "2506204" \
| taxonkit lineage -t \
| csvtk cut -Ht -f 3 \
| csvtk unfold -Ht -f 1 -s ";" \
| taxonkit lineage -r -n -L \
| csvtk cut -Ht -f 1,3,2 \
| csvtk pretty -Ht
10239 superkingdom Viruses
2731342 clade Monodnaviria
2732092 kingdom Shotokuvirae
2732415 phylum Cossaviricota
2732421 class Papovaviricetes
2732533 order Zurhausenvirales
151340 family Papillomaviridae
333774 no rank unclassified Papillomaviridae
333933 clade primate papillomaviruses
2506204 species Macaca fuscata papillomavirus 2
$ taxonkit list --ids 1 \ | taxonkit filter -L species -E species \ | taxonkit lineage -R \ | grep clade \ | pigz -c \
clades.gz
$ zcat clades.gz \ | grep Viruses \ | grep -E "clade.*clade" \ | wc -l 17888
Thanks Wei, Yes, you're right. In this case, seems NCBI should change its rank behavior since for this example we need a "Realm" instead of a clade (Based on ICTV). Anyway, I wrote a script that fixes it ugly! It adds a new column based on the kingdom, and writes the appropriate Realm (clade) name. Best, NP
Hi Dear Wei,
I was wondering to see if there is a placeholder for clade rank?
I'm using this command and looking to have 8 columns for output (now missing clade rank):
Here is part of the input file (joined):