soedinglab / MMseqs2

MMseqs2: ultra fast and sensitive search and clustering suite
https://mmseqs.com
GNU General Public License v3.0
1.32k stars 185 forks source link

NCBI taxids not matching #481

Open padpadpadpad opened 2 years ago

padpadpadpad commented 2 years ago

Hi

Firstly I have found the documentation and usage of mmseqs2 for taxonomic assignment to be extremely good and easy to follow!

However, just a quick point that some of the ncbi taxids given do not match the LCA scientific name.

Below is the output from createtsv from some of my assignments. From previous sequencing I am pretty sure the LCA scientific name (taxa here) is correct, but the ncbi taxa ID do not link to the correct name. For example, there is no tax id for 19759 according to my search here.

tax_id taxonomic_level taxa tax_id_all
58962 species Achromobacter veterisilvae 2;3;4;46;62;468;58962
2 superkingdom Bacteria 2
6620 species Ochrobactrum B soli 2;3;79;91;92;6619;6620
92 family Rhizobiaceae A 2;3;79;91;92
76 genus Pseudomonas E 2;3;4;29;30;76
180 genus Stenotrophomonas 2;3;4;127;128;180
19759 species Variovorax sp003019815 2;3;4;46;62;2887;19759
zhou-sumei commented 2 years ago

I met the same question, taxid:3177 taxon_name: species Ruminococcus E sp003526955 3177 can't found in the NCBI taxonomy databse. @milot-mirdita