shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

filtering ranks #32

Closed shenwei356 closed 3 years ago

shenwei356 commented 4 years ago
in nodes.dmp ``` superkingdom subkingdom kingdom superphylum phylum subphylum superclass class subclass infraclass cohort subcohort superorder order suborder infraorder parvorder superfamily family subfamily tribe subtribe genus subgenus section subsection series species group species subgroup species subspecies varietas forma no rank ```
shenwei356 commented 4 years ago
Wikipedia # https://en.wikipedia.org/wiki/Taxonomic_rank hyperkingdom superkingdom kingdom subkingdom infrakingdom parvkingdom superphylum phylum subphylum infraphylum microphylum superclass class subclass infraclass parvclass superdivision division subdivision infradivision superlegion legion sublegion infralegion supercohort cohort subcohort infracohort gigaorder magnorder grandorder mirorder superorder series order # parvorder nanorder hypoorder minorder suborder infraorder parvorder # section # subsection gigafamily megafamily grandfamily hyperfamily superfamily epifamily # series group family subfamily infrafamily supertribe tribe subtribe infratribe genus subgenus section subsection series subseries superspecies species subspecies varietas variety subvarietas subvariety forma form subforma subform no rank
shenwei356 commented 4 years ago

Taxids can be filtered by taxonomic rank, similar with unikmer rfilter, e.g.,

--higher-than genus
--higher-than genus --equal-to genus
--lower-than species --equal-to species
shenwei356 commented 4 years ago

Rank clade appears in nodes.dmp recenlty.

Clade is not a taxonomic ranking, it's a description of a group of organisms. https://www.quora.com/Between-genus-family-phylum-class-etc-where-does-clade-fit-in

Example:

echo 153215 \
    | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' \
    | taxonkit lineage -r \
    | csvtk -Ht cut -f 1,3,2 \
    | csvtk pretty -t

131567    no rank        cellular organisms
2759      superkingdom   cellular organisms;Eukaryota
33154     clade          cellular organisms;Eukaryota;Opisthokonta
33208     kingdom        cellular organisms;Eukaryota;Opisthokonta;Metazoa
6072      clade          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa
6073      phylum         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria
6101      class          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa
6132      subclass       cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia
40677     order          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea
723794    clade          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea;Stolonifera
86519     family         cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea;Stolonifera;Clavulariidae
86520     genus          cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea;Stolonifera;Clavulariidae;Clavularia
2635089   no rank        cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea;Stolonifera;Clavulariidae;Clavularia;unclassified Clavularia
153215    species        cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Octocorallia;Alcyonacea;Stolonifera;Clavulariidae;Clavularia;unclassified Clavularia;Clavularia sp. Br-1
shenwei356 commented 4 years ago

So does strain:

$ echo 349741 | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' | taxonkit lineage -r | csvtk -Ht cut -f 1,3,2 | csvtk pretty -t
131567    no rank        cellular organisms
2         superkingdom   cellular organisms;Bacteria
1783257   clade          cellular organisms;Bacteria;PVC group
74201     phylum         cellular organisms;Bacteria;PVC group;Verrucomicrobia
203494    class          cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae
48461     order          cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales
1647988   family         cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae
239934    genus          cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia
239935    species        cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila
349741    strain         cellular organisms;Bacteria;PVC group;Verrucomicrobia;Verrucomicrobiae;Verrucomicrobiales;Akkermansiaceae;Akkermansia;Akkermansia muciniphila;Akkermansia muciniphila ATCC BAA-835
shenwei356 commented 4 years ago

serotype:

$ echo 573235  | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' | taxonkit lineage -r | csvtk -Ht cut -f 1,3,2 | csvtk pretty -t
131567   no rank        cellular organisms
2        superkingdom   cellular organisms;Bacteria
1224     phylum         cellular organisms;Bacteria;Proteobacteria
1236     class          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria
91347    order          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales
543      family         cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae
561      genus          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia
562      species        cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli
244319   serotype       cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli;Escherichia coli O26:H11
573235   strain         cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli;Escherichia coli O26:H11;Escherichia coli O26:H11 str. 11368
shenwei356 commented 4 years ago

isolate:

$ echo 5838  | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' | taxonkit lineage -r | csvtk -Ht cut -f 1,3,2 | csvtk pretty -t
131567    no rank        cellular organisms
2759      superkingdom   cellular organisms;Eukaryota
2698737   clade          cellular organisms;Eukaryota;Sar
33630     clade          cellular organisms;Eukaryota;Sar;Alveolata
5794      phylum         cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa
422676    class          cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida
5819      order          cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida
1639119   family         cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae
5820      genus          cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium
418107    subgenus       cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium;Plasmodium (Laverania)
5833      species        cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium;Plasmodium (Laverania);Plasmodium falciparum
5838      isolate        cellular organisms;Eukaryota;Sar;Alveolata;Apicomplexa;Aconoidasida;Haemosporida;Plasmodiidae;Plasmodium;Plasmodium (Laverania);Plasmodium falciparum;Plasmodium falciparum FCR-3/Gambia
shenwei356 commented 4 years ago

biotype:

$ echo 725  | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' | taxonkit lineage -r | csvtk -Ht cut -f 1,3,2 | csvtk pretty -t
131567   no rank        cellular organisms
2        superkingdom   cellular organisms;Bacteria
1224     phylum         cellular organisms;Bacteria;Proteobacteria
1236     class          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria
135625   order          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales
712      family         cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae
724      genus          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus
727      species        cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;Haemophilus influenzae
725      biotype        cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Pasteurellales;Pasteurellaceae;Haemophilus;Haemophilus influenzae;Haemophilus influenzae biotype aegyptius
shenwei356 commented 4 years ago

serogroup

131567   no rank        cellular organisms
2        superkingdom   cellular organisms;Bacteria
1224     phylum         cellular organisms;Bacteria;Proteobacteria
28216    class          cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria
206351   order          cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales
481      family         cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae
482      genus          cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria
487      species        cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;Neisseria meningitidis
491      serogroup      cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;Neisseria meningitidis;Neisseria meningitidis serogroup B
122586   strain         cellular organisms;Bacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria;Neisseria meningitidis;Neisseria meningitidis serogroup B;Neisseria meningitidis MC58
shenwei356 commented 4 years ago

genotype

echo 356418  | taxonkit lineage -t | cut -f 3 | sed 's/;/\n/g' | taxonkit lineage -r | csvtk -Ht cut -f 1,3,2 | csvtk pretty -t
10239     superkingdom   Viruses
2559587   clade          Viruses;Riboviria
2732396   kingdom        Viruses;Riboviria;Orthornavirae
2732406   phylum         Viruses;Riboviria;Orthornavirae;Kitrinoviricota
2732462   class          Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes
2732545   order          Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales
11050     family         Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae
11102     genus          Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae;Hepacivirus
11103     species        Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae;Hepacivirus;Hepacivirus C
33745     genotype       Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae;Hepacivirus;Hepacivirus C;Hepatitis C virus genotype 4
31653     no rank        Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae;Hepacivirus;Hepacivirus C;Hepatitis C virus genotype 4;Hepatitis C virus subtype 4a
356418    strain         Viruses;Riboviria;Orthornavirae;Kitrinoviricota;Flasuviricetes;Amarillovirales;Flaviviridae;Hepacivirus;Hepacivirus C;Hepatitis C virus genotype 4;Hepatitis C virus subtype 4a;Hepatitis C virus ED43
shenwei356 commented 4 years ago

morph

131567   no rank        cellular organisms
2759     superkingdom   cellular organisms;Eukaryota
33154    clade          cellular organisms;Eukaryota;Opisthokonta
4751     kingdom        cellular organisms;Eukaryota;Opisthokonta;Fungi
451864   subkingdom     cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya
5204     phylum         cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota
5302     subphylum      cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina
155619   class          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes
452333   subclass       cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae
5338     order          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae;Agaricales
104366   family         cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae;Agaricales;Pleurotaceae
5320     genus          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae;Agaricales;Pleurotaceae;Pleurotus
5322     species        cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae;Agaricales;Pleurotaceae;Pleurotus;Pleurotus ostreatus
188765   morph          cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetidae;Agaricales;Pleurotaceae;Pleurotus;Pleurotus ostreatus;Pleurotus sp. 'Florida'
shenwei356 commented 4 years ago

pathogroup

131567    no rank        cellular organisms
2         superkingdom   cellular organisms;Bacteria
1224      phylum         cellular organisms;Bacteria;Proteobacteria
1236      class          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria
135614    order          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales
32033     family         cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae
338       genus          cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Xanthomonas
53413     species        cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Xanthomonas;Xanthomonas axonopodis
1982676   pathogroup     cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Xanthomonas;Xanthomonas axonopodis;Xanthomonas axonopodis subcluster 9.1
473403    no rank        cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Xanthomonadales;Xanthomonadaceae;Xanthomonas;Xanthomonas axonopodis;Xanthomonas axonopodis subcluster 9.1;Xanthomonas axonopodis pv. begoniae