shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
369 stars 29 forks source link

Taxon taxid reassigned with reformat #42

Closed standage closed 3 years ago

standage commented 3 years ago

Hello, I noticed some unexpected behavior today. When I query and reformat the lineage for taxid 2507530, taxonkit reformat re-assigns 2516889 as the taxid in the output (the last taxid in the line).

$ echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889
$ echo 2516889 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
2516889 2516889 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019    131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516889  Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889

It looks like these may be duplicated, unmerged taxids.

$ grep -e 2507530 -e 2516889 ~/.taxonkit/names.dmp 
2507530 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2507530>        |       scientific name |
2516889 |       Russula sp. 8 KA-2019   |       Russula sp. 8 KA-2019 <NCBI:txid2516889>        |       scientific name |
$ grep 2516889 ~/.taxonkit/merged.dmp
$

Obviously, we should hope NCBI fixes this in the taxdump soon. But I'm assuming this is not the intended taxonkit behavior?


Prerequisites

Describe your issue

shenwei356 commented 3 years ago

It looks like these may be duplicated, unmerged taxid.

Yes, they are. They should be merged.

taxonkit reformat parses the complete lineages instead of reading TaxIds and querying lineage in real-time, in cases of the TaxIds are not available. It retrieves TaxId of every taxon node by the combination of child and parent name for eliminating name ambiguity.

However, 2507530 and 2516889 have the exactly same lineage :( refromat would fail to distinguish them.

One solution is giving an option to specify the TaxId field for cases where TaxIds are available. Meanwhile, cases of TaxIds with the same complete lineages should be detected while parsing taxdump files.

shenwei356 commented 3 years ago

There are 52 more cases.

child,parent                                       taxid1,taxid2
------------------------------------------------   ----------------

Russula sp. 12 KA-2019, unclassified Russula       2507523, 2516885
Russula sp. 14 KA-2019, unclassified Russula       2507524, 2516886
Russula sp. 15 KA-2019, unclassified Russula       2516887, 2507525
Russula sp. 1 KA-2019, unclassified Russula        2516884, 2507521
Russula sp. 5 KA-2019, unclassified Russula        2516888, 2507527
Russula sp. 8 KA-2019, unclassified Russula        2516889, 2507530 
more cases ``` child,parent taxid1,taxid2 ------------------------------------------------ ----------------- Chiropsoides, Chiropsalmidae 1105130, 2777044 clinical samples, environmental samples 88229, 191496 clinical samples, environmental samples 88229, 226901 environmental samples, Elusimicrobia 699875, 99260 environmental samples, Ichthyophonida 941404, 568718 environmental samples, Roseivirga 543087, 927586 Listeria sp. FSL_L7-0091, unclassified Listeria 2718636, 2713500 Listeria sp. FSL_L7-0993, unclassified Listeria 2718628, 2713505 Listeria sp. FSL_L7-1447, unclassified Listeria 2718633, 2713603 Listeria sp. FSL_L7-1519, unclassified Listeria 2713502, 2718644 Listeria sp. FSL_L7-1582, unclassified Listeria 2718622, 2713504 Mansonella sp. CAM-9837, unclassified Mansonella 2697341, 2694888 Mansonella sp. CAM-9838, unclassified Mansonella 2697340, 2694887 Nemania aenea var. aureolutea, Nemania aenea 2779627, 109380 Penicillium citreoviride, Penicillium 1343377, 64494 Santalales incertae sedis, Santalales 2777525, 1649179 unclassified Acanthocephala, Acanthocephala 2685929, 1009550 unclassified Anisoptera, Anisoptera 1080974, 2685930 unclassified Antipatharia, Antipatharia 2750883, 44307 unclassified Bergia, Bergia 2648616, 2727417 unclassified Cephalothrix, Cephalothrix 2664281, 2741702 unclassified Chlorella, Chlorella 1962113, 2661577 unclassified Digenea, Digenea 2685935, 99681 unclassified Diplolepis, Diplolepis 2677181, 2607940 unclassified Diplotaxis, Diplotaxis 2658736, 2677274 unclassified Diplura, Diplura 2677275, 212010 unclassified Dracaena, Dracaena 2292738, 2677199 unclassified Drosophila, Drosophila 58312, 1931990 unclassified Fridericia, Fridericia 2728542, 2604067 unclassified Giardia, Giardia 1463203, 2770049 unclassified Gonatopus, Gonatopus 2677302, 2659230 unclassified Hypoderma, Hypoderma 2664351, 2677412 unclassified Hyssopus, Hyssopus 2508054, 2714215 unclassified Inga, Inga 2320256, 2659449 unclassified Kurzia, Kurzia 2659477, 2677456 unclassified Liparis, Liparis 2609094, 2200772 unclassified Myrmecia, Myrmecia 2172497, 2677688 unclassified Nitrospira, Nitrospira 1704022, 2652172 unclassified Periploca, Periploca 2677757, 2660233 unclassified Ponera, Ponera 2608256, 2677547 unclassified Senegalia, Senegalia 2696007, 2677834 unclassified Stellaria, Stellaria 2596711, 2677902 unclassified Tetraspora, Tetraspora 2604509, 2677711 unclassified Trentepohlia, Trentepohlia 2137841, 2661401 unclassified Vertebrata, Vertebrata 2662825, 2202232 unclassified Yersinia, Yersinia 2653513, 2677931 ```
more details ``` 1105130 genus Chiropsoides cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides 131567;2759;33154;33208;6072;6073;6137;655440;685045;1105130 2777044 genus Chiropsoides cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides 131567;2759;33154;33208;6072;6073;6137;655440;685045;2777044 2713500 species Listeria sp. FSL_L7-0091 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713500 2718636 species Listeria sp. FSL_L7-0091 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718636 2713505 species Listeria sp. FSL_L7-0993 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0993 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713505 2718628 species Listeria sp. FSL_L7-0993 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0993 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718628 2713603 species Listeria sp. FSL_L7-1447 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1447 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713603 2718633 species Listeria sp. FSL_L7-1447 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1447 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718633 2713502 species Listeria sp. FSL_L7-1519 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1519 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713502 2718644 species Listeria sp. FSL_L7-1519 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1519 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718644 2713504 species Listeria sp. FSL_L7-1582 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1582 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2713504 2718622 species Listeria sp. FSL_L7-1582 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-1582 131567;2;1783272;1239;91061;1385;186820;1637;2642072;2718622 2694888 species Mansonella sp. CAM-9837 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. CAM-9837 131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2694888 2694887 species Mansonella sp. CAM-9838 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. CAM-9838 131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2694887 2697341 species Mansonella sp. Cam-9837 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. Cam-9837 131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2697341 2697340 species Mansonella sp. Cam-9838 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Nematoda;Chromadorea;Rhabditida;Spirurina;Spiruromorpha;Filarioidea;Onchocercidae;Mansonella;unclassified Mansonella;Mansonella sp. Cam-9838 131567;2759;33154;33208;6072;33213;33317;1206794;6231;119089;6236;6274;2072716;6295;6296;42230;2647107;2697340 109380 varietas Nemania aenea var. aureolutea cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Sordariomycetes;Xylariomycetidae;Xylariales;Xylariaceae;Nemania;Nemania aenea;Nemania aenea var. aureolutea 131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147550;222545;37989;37990;109374;109375;109380 2779627 varietas Nemania aenea var. aureolutea cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Sordariomycetes;Xylariomycetidae;Xylariales;Xylariaceae;Nemania;Nemania aenea;Nemania aenea var. aureolutea 131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147550;222545;37989;37990;109374;109375;2779627 64494 species Penicillium citreoviride cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotiomycetes;Eurotiomycetidae;Eurotiales;Aspergillaceae;Penicillium;Penicillium citreoviride 131567;2759;33154;4751;451864;4890;716545;147538;716546;147545;451871;5042;1131492;5073;64494 1343377 species Penicillium citreoviride cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;Eurotiomycetes;Eurotiomycetidae;Eurotiales;Aspergillaceae;Penicillium;Penicillium citreoviride 131567;2759;33154;4751;451864;4890;716545;147538;716546;147545;451871;5042;1131492;5073;1343377 2507521 species Russula sp. 1 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 1 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507521 2516884 species Russula sp. 1 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 1 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516884 2507527 species Russula sp. 5 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 5 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507527 2516888 species Russula sp. 5 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 5 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516888 2507530 species Russula sp. 8 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530 2516889 species Russula sp. 8 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516889 2507523 species Russula sp. 12 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 12 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507523 2516885 species Russula sp. 12 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 12 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516885 2507524 species Russula sp. 14 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 14 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507524 2516886 species Russula sp. 14 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 14 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516886 2507525 species Russula sp. 15 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 15 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507525 2516887 species Russula sp. 15 KA-2019 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 15 KA-2019 131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2516887 1649179 no rank Santalales incertae sedis cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Santalales;Santalales incertae sedis 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;41947;1649179 2777525 no rank Santalales incertae sedis cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Santalales;Santalales incertae sedis 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;41947;2777525 88229 no rank clinical samples cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Chlamydiales;environmental samples;clinical samples 131567;2;1783257;204428;204429;51291;95916;88229 88229 no rank clinical samples cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Chlamydiales;environmental samples;clinical samples 131567;2;1783257;204428;204429;51291;95916;88229 191496 no rank clinical samples cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Parachlamydiales;Parachlamydiaceae;environmental samples;clinical samples 131567;2;1783257;204428;204429;1963360;92713;141644;191496 226901 no rank clinical samples cellular organisms;Bacteria;PVC group;Chlamydiae;Chlamydiia;Parachlamydiales;Parachlamydiaceae;Neochlamydia;environmental samples;clinical samples 131567;2;1783257;204428;204429;1963360;92713;112987;212217;226901 99260 no rank environmental samples cellular organisms;Bacteria;Elusimicrobia;environmental samples 131567;2;74152;99260 543087 no rank environmental samples cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Cytophagia;Cytophagales;Roseivirgaceae;Roseivirga;environmental samples 131567;2;1783270;68336;976;768503;768507;2762306;290180;543087 568718 no rank environmental samples cellular organisms;Eukaryota;Opisthokonta;Ichthyosporea;Ichthyophonida;environmental samples 131567;2759;33154;127916;198625;568718 699875 no rank environmental samples cellular organisms;Bacteria;Elusimicrobia;Elusimicrobia;environmental samples 131567;2;74152;641853;699875 927586 no rank environmental samples cellular organisms;Bacteria;FCB group;Bacteroidetes/Chlorobi group;Bacteroidetes;Cytophagia;Cytophagales;Roseivirgaceae;Roseivirga;environmental samples 131567;2;1783270;68336;976;768503;768507;2762306;290180;927586 941404 no rank environmental samples cellular organisms;Eukaryota;Opisthokonta;Ichthyosporea;Ichthyophonida;environmental samples 131567;2759;33154;127916;198625;941404 1009550 no rank unclassified Acanthocephala cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Acanthocephala;unclassified Acanthocephala 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;10232;1009550 2685929 no rank unclassified Acanthocephala cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Paraneoptera;Hemiptera;Prosorrhyncha;Heteroptera;Euheteroptera;Neoheteroptera;Panheteroptera;Pentatomomorpha;Coreoidea;Coreidae;Coreinae;Acanthocephala;unclassified Acanthocephala 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33342;7524;33343;33345;33347;33349;33351;33357;38105;186376;2068237;2316800;2685929 1080974 no rank unclassified Anisoptera cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Palaeoptera;Odonata;Epiprocta;Anisoptera;unclassified Anisoptera 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33339;6961;2510002;6962;1080974 2685930 no rank unclassified Anisoptera cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids;Malvales;Dipterocarpaceae;Dipterocarpoideae;Anisoptera;unclassified Anisoptera 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91836;41938;40588;65009;64577;2685930 44307 no rank unclassified Antipatharia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Antipatharia;unclassified Antipatharia 131567;2759;33154;33208;6072;6073;6101;6102;44168;44307 2750883 no rank unclassified Antipatharia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Antipatharia;unclassified Antipatharia 131567;2759;33154;33208;6072;6073;6101;6102;44168;2750883 2648616 no rank unclassified Bergia cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Malpighiales;Elatinaceae;Bergia;unclassified Bergia 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;3646;125023;125024;2648616 2727417 no rank unclassified Bergia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Anthozoa;Hexacorallia;Zoantharia;Parazoanthidae;Bergia;unclassified Bergia 131567;2759;33154;33208;6072;6073;6101;6102;44927;44928;2723760;2727417 2664281 no rank unclassified Cephalothrix cellular organisms;Bacteria;Terrabacteria group;Cyanobacteria/Melainabacteria group;Cyanobacteria;Oscillatoriophycideae;Oscillatoriales;Coleofasciculaceae;Cephalothrix;unclassified Cephalothrix 131567;2;1783272;1798711;1117;1301283;1150;1892251;1844514;2664281 2741702 no rank unclassified Cephalothrix cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Nemertea;Palaeonemertea;Cephalothricidae;Cephalothrix;unclassified Cephalothrix 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6217;1684132;166040;166041;2741702 1962113 no rank unclassified Chlorella cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Chlorellales;Chlorellaceae;Chlorella clade;Chlorella;unclassified Chlorella 131567;2759;33090;3041;2692248;75966;35460;35461;2511126;3071;1962113 2661577 no rank unclassified Chlorella cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Trebouxiophyceae incertae sedis;Chlorella;unclassified Chlorella 131567;2759;33090;3041;2692248;75966;75981;114055;2661577 99681 no rank unclassified Digenea cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Platyhelminthes;Trematoda;Digenea;unclassified Digenea 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6157;6178;6179;99681 2685935 no rank unclassified Digenea cellular organisms;Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniophycidae;Ceramiales;Rhodomelaceae;Polysiphonioideae;Digenea;unclassified Digenea 131567;2759;2763;2806;2045261;2802;2803;2008651;256429;2685935 2607940 no rank unclassified Diplolepis cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Parasitoida;Cynipoidea;Cynipidae;Cynipinae;Diplolepidini;Diplolepis;unclassified Diplolepis 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;1955251;40307;73401;1159319;167046;73404;2607940 2677181 no rank unclassified Diplolepis cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Gentianales;Apocynaceae;Asclepiadoideae;Asclepiadeae;MOOG clade;Diplolepinae;Diplolepis;unclassified Diplolepis 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4055;4056;167484;167488;2546561;1498481;274548;2677181 2658736 no rank unclassified Diplotaxis cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Coleoptera;Polyphaga;Scarabaeiformia;Scarabaeoidea;Scarabaeidae;Melolonthinae;Diplotaxis;unclassified Diplotaxis 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7041;41084;41086;75546;7055;7059;1710485;2658736 2677274 no rank unclassified Diplotaxis cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;malvids;Brassicales;Brassicaceae;Brassiceae;Diplotaxis;unclassified Diplotaxis 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91836;3699;3700;981071;3731;2677274 212010 no rank unclassified Diplura cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Diplura;unclassified Diplura 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;29997;212010 2677275 no rank unclassified Diplura cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Chelicerata;Arachnida;Araneae;Mygalomorphae;Dipluridae;Diplura;unclassified Diplura 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;6843;6854;6893;6894;88327;371957;2677275 2292738 no rank unclassified Dracaena cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Asparagaceae;Nolinoideae;Dracaena;unclassified Dracaena 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;40552;703537;39502;2292738 2677199 no rank unclassified Dracaena cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Sarcopterygii;Dipnotetrapodomorpha;Tetrapoda;Amniota;Sauropsida;Sauria;Lepidosauria;Squamata;Bifurcata;Unidentata;Episquamata;Laterata;Teiioidea;Teiidae;Dracaena;unclassified Dracaena 131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;8287;1338369;32523;32524;8457;32561;8504;8509;1329961;1329950;1329912;1329976;35036;8530;420544;2677199 58312 no rank unclassified Drosophila cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophila;unclassified Drosophila 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;7215;58312 1931990 no rank unclassified Drosophila cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Acalyptratae;Ephydroidea;Drosophilidae;Drosophilinae;Drosophilini;Drosophila;Drosophila;unclassified Drosophila 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43741;43746;7214;43845;46877;7215;32281;1931990 2604067 no rank unclassified Fridericia cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Lamiales;Bignoniaceae;Bignonieae;Fridericia;unclassified Fridericia 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4143;24079;423302;354074;2604067 2728542 no rank unclassified Fridericia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Annelida;Clitellata;Oligochaeta;Enchytraeida;Enchytraeidae;Fridericia;unclassified Fridericia 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6340;42113;6381;1964463;6388;77730;2728542 1463203 no rank unclassified Giardia cellular organisms;Eukaryota;Metamonada;Fornicata;Diplomonadida;Hexamitidae;Giardiinae;Giardia;unclassified Giardia 131567;2759;2611341;207245;5738;5739;68459;5740;1463203 2770049 no rank unclassified Giardia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Mollusca;Gastropoda;Heterobranchia;Euthyneura;Panpulmonata;Eupulmonata;Stylommatophora;Helicina;Camaenoidea;Camaenidae;Giardia;unclassified Giardia 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6447;6448;216305;216307;977775;120490;6527;216366;87864;83226;2770048;2770049 2659230 no rank unclassified Gonatopus cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Chrysidoidea;Dryinidae;Gonatopodinae;Gonatopus;unclassified Gonatopus 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;40304;144390;2326770;216179;2659230 2677302 no rank unclassified Gonatopus cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Alismatales;Araceae;Philodendroideae;Zamioculcadeae;Gonatopus;unclassified Gonatopus 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;16360;4454;421921;293485;175762;2677302 2664351 no rank unclassified Hypoderma cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;sordariomyceta;Leotiomycetes;Rhytismatales;Rhytismataceae;Hypoderma;unclassified Hypoderma 131567;2759;33154;4751;451864;4890;716545;147538;716546;715989;147548;47166;47167;696359;2664351 2677412 no rank unclassified Hypoderma cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Brachycera;Muscomorpha;Eremoneura;Cyclorrhapha;Schizophora;Calyptratae;Oestroidea;Oestridae;Hypodermatinae;Hypoderma;unclassified Hypoderma 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7203;43733;480118;480117;43738;43742;43755;7387;43915;7388;2677412 2508054 no rank unclassified Hyssopus cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Lamiales;Lamiaceae;Nepetoideae;Mentheae;Hyssopus;unclassified Hyssopus 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4143;4136;216706;216718;39168;2508054 2714215 no rank unclassified Hyssopus cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Parasitoida;Chalcidoidea;Eulophidae;Eulophinae;Hyssopus;unclassified Hyssopus 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;1955251;7422;107755;150275;108394;2714215 2320256 no rank unclassified Inga cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Fabales;Fabaceae;Caesalpinioideae;mimosoid clade;Ingeae;Inga;unclassified Inga 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;72025;3803;3804;3807;163486;162809;2320256 2659449 no rank unclassified Inga cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidoptera;Heteroneura;Ditrysia;Gelechioidea;Oecophoridae;Oecophorinae;Inga;unclassified Inga 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;85604;7088;41191;41196;41197;37567;37581;57992;116123;690231;2659449 2659477 no rank unclassified Kurzia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Crustacea;Branchiopoda;Phyllopoda;Diplostraca;Cladocera;Anomopoda;Chydoridae;Kurzia;unclassified Kurzia 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6657;6658;116557;84337;6665;116561;77713;527153;2659477 2677456 no rank unclassified Kurzia cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Marchantiophyta;Jungermanniopsida;Jungermanniidae;Jungermanniales;Lophocoleineae;Lepidoziaceae;Lembidioideae;Kurzia;unclassified Kurzia 131567;2759;33090;35493;131221;3193;3195;186771;186782;3199;3204;13806;1484581;428516;2677456 2200772 no rank unclassified Liparis cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Orchidaceae;Epidendroideae;Malaxideae;Malaxidinae;Liparis;unclassified Liparis 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;4747;158332;158393;1759432;78793;2200772 2609094 no rank unclassified Liparis cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;Gnathostomata;Teleostomi;Euteleostomi;Actinopterygii;Actinopteri;Neopterygii;Teleostei;Osteoglossocephalai;Clupeocephala;Euteleosteomorpha;Neoteleostei;Eurypterygia;Ctenosquamata;Acanthomorphata;Euacanthomorphacea;Percomorphaceae;Eupercaria;Perciformes;Cottioidei;Cottales;Liparidae;Liparis;unclassified Liparis 131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;7776;117570;117571;7898;186623;41665;32443;1489341;186625;1489388;123365;123366;123367;123368;123369;1489872;1489922;8111;8100;1490021;183715;183716;2609094 2172497 no rank unclassified Myrmecia cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Trebouxiophyceae;Trebouxiales;Trebouxiaceae;Myrmecia;unclassified Myrmecia 131567;2759;33090;3041;2692248;75966;2507901;2507902;114064;2172497 2677688 no rank unclassified Myrmecia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Formicoidea;Formicidae;Myrmeciinae;Myrmeciini;Myrmecia;unclassified Myrmecia 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;2153479;36668;36669;232194;13617;2677688 1704022 no rank unclassified Nitrospira cellular organisms;Bacteria;Nitrospirae;Nitrospira;unclassified Nitrospira 131567;2;40117;203693;1704022 2652172 no rank unclassified Nitrospira cellular organisms;Bacteria;Nitrospirae;Nitrospira;Nitrospirales;Nitrospiraceae;Nitrospira;unclassified Nitrospira 131567;2;40117;203693;189778;189779;1234;2652172 2660233 no rank unclassified Periploca cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Amphiesmenoptera;Lepidoptera;Glossata;Neolepidoptera;Heteroneura;Ditrysia;Gelechioidea;Cosmopterigidae;Chrysopeleiinae;Periploca;unclassified Periploca 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;85604;7088;41191;41196;41197;37567;37581;173647;248747;347720;2660233 2677757 no rank unclassified Periploca cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;asterids;lamiids;Gentianales;Apocynaceae;Periplocoideae;Periploca;unclassified Periploca 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71274;91888;4055;4056;167485;63484;2677757 2608256 no rank unclassified Ponera cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Hymenoptera;Apocrita;Aculeata;Formicoidea;Formicidae;Ponerinae;Ponerini;Ponera;unclassified Ponera 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7399;7400;7434;2153479;36668;43085;141711;216406;2608256 2677547 no rank unclassified Ponera cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;Liliopsida;Petrosaviidae;Asparagales;Orchidaceae;Epidendroideae;Epidendreae;Ponerinae;Ponera;unclassified Ponera 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;4447;1437197;73496;4747;158332;158389;1005053;123181;2677547 2677834 no rank unclassified Senegalia cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Clostridia;Eubacteriales;Clostridiaceae;Senegalia;unclassified Senegalia 131567;2;1783272;1239;186801;186802;31979;1924097;2677834 2696007 no rank unclassified Senegalia cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;rosids;fabids;Fabales;Fabaceae;Caesalpinioideae;mimosoid clade;Acacieae;Senegalia;unclassified Senegalia 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;71275;91835;72025;3803;3804;3807;163485;468156;2696007 2596711 no rank unclassified Stellaria cellular organisms;Eukaryota;Viridiplantae;Streptophyta;Streptophytina;Embryophyta;Tracheophyta;Euphyllophyta;Spermatophyta;Magnoliopsida;Mesangiospermae;eudicotyledons;Gunneridae;Pentapetalae;Caryophyllales;Caryophyllaceae;Alsineae;Stellaria;unclassified Stellaria 131567;2759;33090;35493;131221;3193;58023;78536;58024;3398;1437183;71240;91827;1437201;3524;3568;1141488;13273;2596711 2677902 no rank unclassified Stellaria cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Spiralia;Lophotrochozoa;Mollusca;Gastropoda;Caenogastropoda;Littorinimorpha;Xenophoroidea;Xenophoridae;Stellaria;unclassified Stellaria 131567;2759;33154;33208;6072;33213;33317;2697495;1206795;6447;6448;69555;216294;159995;906789;1297112;2677902 2604509 no rank unclassified Tetraspora cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;core chlorophytes;Chlorophyceae;Tetrasporales;Tetrasporaceae;Tetraspora;unclassified Tetraspora 131567;2759;33090;3041;2692248;3166;31305;35481;56012;2604509 2677711 no rank unclassified Tetraspora cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Myxozoa;Myxosporea;Myxosporea incertae sedis;Tetraspora;unclassified Tetraspora 131567;2759;33154;33208;6072;6073;35581;35582;1051104;148349;2677711 2137841 no rank unclassified Trentepohlia cellular organisms;Eukaryota;Viridiplantae;Chlorophyta;Ulvophyceae;TCBD clade;Trentepohliales;Trentepohliaceae;Trentepohlia;unclassified Trentepohlia 131567;2759;33090;3041;33103;2546214;35443;35445;173374;2137841 2661401 no rank unclassified Trentepohlia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Endopterygota;Diptera;Nematocera;Tipulomorpha;Tipuloidea;Limoniidae;Limoniinae;Trentepohlia;unclassified Trentepohlia 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33392;7147;7148;43789;41829;43823;52737;2018059;2661401 2202232 no rank unclassified Vertebrata cellular organisms;Eukaryota;Rhodophyta;Florideophyceae;Rhodymeniophycidae;Ceramiales;Rhodomelaceae;Polysiphonioideae;Vertebrata;unclassified Vertebrata 131567;2759;2763;2806;2045261;2802;2803;2008651;1261581;2202232 2662825 no rank unclassified Vertebrata cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Chordata;Craniata;Vertebrata;unclassified Vertebrata 131567;2759;33154;33208;6072;33213;33511;7711;89593;7742;2662825 2653513 no rank unclassified Yersinia cellular organisms;Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Yersiniaceae;Yersinia;unclassified Yersinia 131567;2;1224;1236;91347;1903411;629;2653513 2677931 no rank unclassified Yersinia cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Protostomia;Ecdysozoa;Panarthropoda;Arthropoda;Mandibulata;Pancrustacea;Hexapoda;Insecta;Dicondylia;Pterygota;Neoptera;Polyneoptera;Dictyoptera;Mantodea;Mantidae;Amelinae;Yersinia;unclassified Yersinia 131567;2759;33154;33208;6072;33213;33317;1206794;88770;6656;197563;197562;6960;50557;85512;7496;33340;33341;6970;7504;7505;267071;444888;2677931 ```
shenwei356 commented 3 years ago

One solution is giving an option to specify the TaxId field for cases where TaxIds are available. Meanwhile, cases of TaxIds with the same complete lineages should be detected while parsing taxdump files.

Done.

Now, for these cases, warning messages are shown, and no data returns. But you can use -a/--output-ambiguous-result to return one possible result, like the old version did.

echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids
19:27:53.478 [WARN] we can't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019     131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530   Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdo;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species

echo 2507530 \
    | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name --show-lineage-ranks \
    | taxonkit reformat --lineage-field 3 --show-lineage-taxids -a
19:30:23.031 [WARN] we can't distinguish the TaxIds (2507530, 2516889) for lineage: cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019. But you can use -a/--output-ambiguous-result to return one possible result
2507530 2507530 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Basidiomycota;Agaricomycotina;Agaricomycetes;Agaricomycetes incertae sedis;Russulales;Russulaceae;Russula;unclassified Russula;Russula sp. 8 KA-2019     131567;2759;33154;4751;451864;5204;5302;155619;355688;452342;5401;5402;2602424;2507530   Russula sp. 8 KA-2019   species no rank;superkingdom;clade;kingdom;subkingdom;phylum;subphylum;class;no rank;order;family;genus;no rank;species  Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019      2759;5204;155619;452342;5401;5402;2507530

If TaxIds are available, use -I/--taxid-field to tell the filed of TaxIds. :champagne:

$ echo -ne "2507530\n2516889\n" | TAXONKIT_DB=. taxonkit reformat -I 1 -t
2507530 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2507530
2516889 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889
standage commented 3 years ago

Tremendous. Thank you!

standage commented 3 years ago

By the way, I submitted Russula sp. 12 KA-2019, unclassified Russula 2507523, 2516885 to the NCBI help desk yesterday, before your response. Maybe we should just point them to this thread for all the others. 😀

shenwei356 commented 3 years ago

Hi @standage , any responce from NCBI?

Do you have any other issues while using or suggestions? I'd like to release a new version with this improved reformat.

standage commented 3 years ago

I haven't had any other issues, thanks!

NCBI responded with the following.

Thank you very much for the notice. We have merged several such erroneous duplicates.

I didn't point them to this thread, I only mentioned Russula sp. 12 KA-2019, unclassified Russula 2507523, 2516885 in my ticket, and I haven't checked whether the latest update fixes the cases you found. So I'm not sure what the status is.

shenwei356 commented 3 years ago

I check the latest taxdump files, some were merged while some not.

09:29:49.752 [WARN] taxid 2516885 was merged into 2507523
09:29:49.752 [WARN] taxid 2516886 was merged into 2507524
09:29:49.752 [WARN] taxid 2516887 was merged into 2507525
09:29:49.752 [WARN] taxid 2516884 was merged into 2507521
09:29:49.752 [WARN] taxid 2516888 was merged into 2507527
09:29:49.752 [WARN] taxid 2516889 was merged into 2507530
$ echo -ne "1105130\n2718636"  | TAXONKIT_DB=. taxonkit lineage |  TAXONKIT_DB=. taxonkit reformat -t
[09:31:27.603 [WARN] we can't distinguish the TaxIds (1105130, 2777044) for lineage: cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides. But you can use -a/--output-ambiguous-result to return one possible result
09:31:27.603 [WARN] we can't distinguish the TaxIds (2713500, 2718636) for lineage: cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091. But you can use -a/--output-ambiguous-result to return one possible result
1105130 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Cnidaria;Cubozoa;Chirodropida;Chiropsalmidae;Chiropsoides
2718636 cellular organisms;Bacteria;Terrabacteria group;Firmicutes;Bacilli;Bacillales;Listeriaceae;Listeria;unclassified Listeria;Listeria sp. FSL_L7-0091
Username-felix-is-not-available commented 3 years ago

@shenwei356 First of all, thank you very much for creating this great tool! It has been very helpful in my research.

If I understood correctly, the warning should only appear, if two lineages are completely identical. However, I also get this warning for two species with the same name and a different lineage. I am using taxonkit 0.80 and the taxdump downloaded today.

echo -ne "46515\n" | taxonkit lineage | taxonkit reformat

produces

[WARN] we can't distinguish the TaxIds (46515, 1276929)

But the lineages of the two taxa are not identical: 46515 cellular organisms;Eukaryota;Opisthokonta;Metazoa;Eumetazoa;Bilateria;Deuterostomia;Echinodermata;Eleutherozoa;Asterozoa;Asteroidea;Valvatacea;Valvatida;Asterinidae;Asterina;Asterina gibbosa

1276929 cellular organisms;Eukaryota;Opisthokonta;Fungi;Dikarya;Ascomycota;saccharomyceta;Pezizomycotina;leotiomyceta;dothideomyceta;Dothideomycetes;Dothideomycetes incertae sedis;Asterinales;Asterinaceae;Asterina;Asterina gibbosa

Is this expected behavior? Have a nice day, Felix

shenwei356 commented 3 years ago

By default, taxonkit reformat find the taxid from the taxon name and name of its parent taxon. Here, it's "Asterina;Asterina gibbosa".

If TaxIds are available, use -I/--taxid-field to tell the filed of TaxIds. :champagne:

$ echo -ne "2507530\n2516889\n" | TAXONKIT_DB=. taxonkit reformat -I 1 -t
2507530 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2507530
2516889 Eukaryota;Basidiomycota;Agaricomycetes;Russulales;Russulaceae;Russula;Russula sp. 8 KA-2019     2759;5204;155619;452342;5401;5402;2516889
Username-felix-is-not-available commented 3 years ago

Thank you for your swift reply! That makes sense. Actually, I wasn't aware of that option, but it makes life easier for me.