motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
144 stars 24 forks source link

mOTUs #90

Closed Jibowe closed 2 years ago

Jibowe commented 2 years ago

Hello! I wonder if “pBacteria phylum [Proteobacteria/Bacteroidetes]” in the classification report represents a new phylum that has not been discovered before. Similarly,for “fCaulobacterales fam. incertae sedis”,it means a novel family?

AlessioMilanese commented 2 years ago

Hi @Jibowe,

It is not necessarily true. In particular, if it comes from a ref-mOTUs, then it is not a new clade. For example the ref-mOTU 01744 has the annotation:

2 Bacteria
1239 Firmicutes
91061 Bacilli
1385 Bacillales
NA Bacillales fam. incertae sedis
33986 Exiguobacterium
132920 Exiguobacterium antarcticum [Exiguobacterium antarcticum/Exiguobacterium sp. KRL4]

Where the family is NA and incertae sedis. This is because the NCBI taxonomy does not have a family annotaiton: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=132920

Lineage (full): cellular organisms; Bacteria; Terrabacteria group; Firmicutes; Bacilli; Bacillales; Bacillales incertae sedis; Bacillales Family XII. Incertae Sedis; Exiguobacterium

AlessioMilanese commented 2 years ago

Also, for ref-mOTUs you can have that two clades are clustered in the same mOTUs. As a result we report both within square brakets.

In particular for your example, we have ref-mOTU 00077:

2 Bacteria
NA Bacteria phylum [Proteobacteria/Bacteroidetes]
NA Bacteria class [Sphingobacteriia/Gammaproteobacteria]
NA Bacteria order [Sphingobacteriales/Enterobacterales] NA Bacteria fam. [Erwiniaceae/Enterobacteriaceae/Sphingobacteriaceae]
NA Bacteria gen. [Lelliottia/Pantoea/Enterobacter/Escherichia/Klebsiella/Leclercia/Pedobacter]
NA Bacteria sp. [Enterobacter sp. SENG-6/Enterobacter sp. MGH 1/Enterobacter sp. MGH 3/Enterobacter sp. MGH 6/Enterobacter sp. MGH 7/Enterobacter sp. MGH 10/Enterobacter sp. MGH 14/Enterobacter sp. MGH 15/Enterobacter sp. MGH 22/Enterobacter sp. MGH 23/Enterobacter sp. MGH 24/Enterobacter sp. MGH 25/Enterobacter sp. MGH 33/Enterobacter sp. MGH 37/Enterobacter sp. MGH 38/Enterobacter sp. BWH 37/Enterobacter sp. BIDMC 26/Enterobacter sp. BIDMC 27/Enterobacter sp. BIDMC 28/Enterobacter sp. BIDMC 30/Enterobacter sp. EGD-HP1/Enterobacter sp. DC3/Enterobacter sp. DC4/Enterobacter sp. T1-1/Enterobacter sp. UCD-UG_FMILLET/Enterobacter sp. E20/Enterobacter sp. NFIX58/Enterobacter sp. NFIX45/Enterobacter sp. NFIX59/Enterobacter sp. 940_PEND/Enterobacter sp. HMSC16D10/Enterobacter hormaechei/Enterobacter sp. BIDMC92/Leclercia sp. LK8/Enterobacter sp. BWH52/Enterobacter sp. BWH63/Enterobacter sp. BWH64/Enterobacter sp. MGH119/Enterobacter sp. MGH120/Enterobacter sp. MGH128/Enterobacter sp. BIDMC87/Enterobacter sp. BIDMC93/Enterobacter sp. BIDMC94/Enterobacter sp. BIDMC99/Enterobacter sp. BIDMC100/Enterobacter sp. BIDMC109/Enterobacter sp. 50588862/Enterobacter sp. 50793107/Enterobacter sp. 50858885/Enterobacter sp. K66-74/Enterobacter roggenkampii/Enterobacter sp. ODB01/Enterobacter sp. IF2SW-P2/Enterobacter sp. HK169/Enterobacter sp. PDC34/Pantoea sesami/Enterobacter sp. ku-bf2/Enterobacter sp. 56-7/Enterobacter sp. ST121:950178628/Enterobacter sp. J49/Enterobacter sichuanensis/Enterobacter kobei/Enterobacter genomosp. O/Enterobacter genomosp. S/Pedobacter himalayensis/Enterobacter chengduensis/Enterobacter ludwigii/Enterobacter sp. DC1/Klebsiella aerogenes/Enterobacter cloacae/Escherichia coli/Klebsiella oxytoca/Enterobacter asburiae/Lelliottia amnigena/Enterobacter cancerogenus/Lelliottia nimipressuralis/Leclercia adecarboxylata/Enterobacter bugandensis]

where NCBI classified genomes wrongly. The set of genomes is genetically similar (that's why they are in the same mOTU), but NCBI has many different annotations (so different that they are even in different phyla).

AlessioMilanese commented 2 years ago

For meta- and ext-mOTUs instead an incertae sedis annotation means that we could not annotate it at that level. It could be that it's a new clade, but it could also mean that we did not have enough confidence to annotate it.