Closed standage closed 3 years ago
Ruh roh, I found another example. When formatting the lineage for 1973489, the penultimate taxid switches between 1386 (the correct genus) and 55087 (an insect genus of the same name).
$ for i in {1..6}; do echo 1973489 | taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / | taxonkit reformat --lineage-field 3 --show-lineage-taxids -d /; done
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;55087;1973489
1973489 1973489 cellular organisms/Bacteria/Terrabacteria group/Firmicutes/Bacilli/Bacillales/Bacillaceae/Bacillus/Bacillus cereus group/Bacillus sp. ISSFR-25F 131567/2/1783272/1239/91061/1385/186817/1386/86661/1973489 Bacillus sp. ISSFR-25F species Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
Thanks, I Will check it tomorrow.
Fixed. I mapping (name, parent-name) to taxID to distinguish names shared by different taxIDs. I used it to find the right rank but forgot to apply to taxid :(
for i in {1..6}; do \
echo 446045 \
| taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
| taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
| cut -f 1,7,8;
done
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
446045 Eukaryota;Arthropoda;Insecta;Diptera;Drosophilidae;Drosophila; 2759;6656;50557;7147;7214;7215;
for i in {1..6}; do \
echo 1973489 \
| taxonkit lineage --show-lineage-taxids --show-rank --show-status-code --show-name -d / \
| taxonkit reformat --lineage-field 3 --show-lineage-taxids -d / \
| cut -f 1,7,8;
done
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
1973489 Bacteria;Firmicutes;Bacilli;Bacillales;Bacillaceae;Bacillus;Bacillus sp. ISSFR-25F 2;1239;91061;1385;186817;1386;1973489
Hi @shenwei356, I just upgraded to 0.6.1 and I found some unexpected behavior when querying the lineage for taxid 446045 (Drosophila serrata species complex). The full lineage from
taxonkit lineage
is consistent and correct, but the abbreviated lineage fromtaxonkit reformat
is inconsistent. The final taxon in the abbreviated lineage switches between 7215 (the correct genus), 32281 (a subgenus), and 2081351 (a totally unrelated genus that coincidentally shares the same name).Prerequisites
taxonkit version
Describe your issue