shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

Inconsistent "taxonkit reformat" output for 446045 #35

Closed standage closed 3 years ago

standage commented 3 years ago

Hi @shenwei356, I just upgraded to 0.6.1 and I found some unexpected behavior when querying the lineage for taxid 446045 (Drosophila serrata species complex). The full lineage from taxonkit lineage is consistent and correct, but the abbreviated lineage from taxonkit reformat is inconsistent. The final taxon in the abbreviated lineage switches between 7215 (the correct genus), 32281 (a subgenus), and 2081351 (a totally unrelated genus that coincidentally shares the same name).

$ for i in {1..6}; do echo 446045 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;32281;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;2081351;
446045  446045  cellular organisms/Eukaryota/Opisthokonta/Metazoa/Eumetazoa/Bilateria/Protostomia/Ecdysozoa/Panarthropoda/Arthropoda/Mandibulata/Pancrustacea/Hexapoda/Insecta/Dicondylia/Pterygota/Neoptera/Holometabola/Diptera/Brachycera/Muscomorpha/Eremoneura/Cyclorrhapha/Schizophora/Acalyptratae/Ephydroidea/Drosophilidae/Drosophilinae/Drosophilini/Drosophila/Sophophora/melanogaster group/montium subgroup/Drosophila serrata species complex   131567/2759/33154/33208/6072/33213/33317/1206794/88770/6656/197563/197562/6960/50557/85512/7496/33340/33392/7147/7203/43733/480118/480117/43738/43741/43746/7214/43845/46877/7215/32341/32346/32352/446045    Drosophila serrata species complex      no rank Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;

Prerequisites

Describe your issue

standage commented 3 years ago

Ruh roh, I found another example. When formatting the lineage for 1973489, the penultimate taxid switches between 1386 (the correct genus) and 55087 (an insect genus of the same name).

$ for i in {1..6}; do echo 1973489 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F    131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489      Bacillus sp. ISSFR-25F  species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
shenwei356 commented 3 years ago

Thanks, I Will check it tomorrow.

shenwei356 commented 3 years ago

Fixed. I mapping (name, parent-name) to taxID to distinguish names shared by different taxIDs. I used it to find the right rank but forgot to apply to taxid :(

for i in {1..6}; do \
    echo 446045 \
        | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
        | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
        | cut -f 1,7,8; 
done
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;
446045  Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila;  2759;6656;50557;7147;7214;7215;

for i in {1..6}; do \
    echo 1973489 \
        | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
        | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
        | cut -f 1,7,8; 
done
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F      2;1239;91061;1385;186817;1386;1973489