motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
144 stars 24 forks source link

Inconsistencies in the mOTUs3 - genome metadata #105

Closed fplazaonate closed 10 months ago

fplazaonate commented 1 year ago

Hello,

I have downloaded the mOTUs3 - genome metadata file on Zenodo. I have noticed that some mOTUs correspond to genomes with GTDB annotation completely different.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Sporofaciens;s__Sporofaciens sp910575835 -- | -- ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__UBA3402;s__UBA3402 sp910586845 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Dysosmobacter;s__Dysosmobacter sp000403435 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Lawsonibacter;s__Lawsonibacter sp009917615 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Lawsonibacter;s__Lawsonibacter sp910575265 ref_mOTU_v3_01070 | d__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Muribaculaceae;g__CAG-485;s__CAG-485 sp910588245 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Clostridium_Q;s__Clostridium_Q sp910575795 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Dysosmobacter;s__Dysosmobacter sp910588175 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Pelethomonas;s__Pelethomonas sp910578505 ref_mOTU_v3_01070 | d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Christensenellales;f__Borkfalkiaceae;g__Gallimonas;s__Gallimonas sp910585235

Could you explain this?

Thanks, Florian

hjruscheweyh commented 1 year ago

Hi @fplaza

Thank you for using mOTUs.

You're right. There are some issues with a small number of clusters in the mOTUs mapping file. We're currently working on resolving these issues and will release a new mapping file soon. Keep in mind that this issues does NOT influence the mOTUs database but only the mapping file as the reported genomes are only associated to a mOTUs cluster but their genes are not added to the database (Column MGS_ADDED).

Best, Hans

fplazaonate commented 1 year ago

Thanks Hans. Could you tell me what is the correct annotation for ref_mOTU_v3_01070?

hjruscheweyh commented 1 year ago

try

curl -O https://zenodo.org/record/7146984/files/mOTUs3.genome_metadata.tsv.gz
gunzip -c mOTUs3.genome_metadata.tsv.gz | grep "ref_mOTU_v3_01070" | awk '{ if($4 == "True") { print }}' | cut -f 1,2,10
1906854.SAMN05861045    ref_mOTU_v3_01070   d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Brevundimonas;s__Brevundimonas diminuta
293.SAMN03480409    ref_mOTU_v3_01070   d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Brevundimonas;s__Brevundimonas diminuta
751586.SAMN02469913 ref_mOTU_v3_01070   d__Bacteria;p__Proteobacteria;c__Alphaproteobacteria;o__Caulobacterales;f__Caulobacteraceae;g__Brevundimonas;s__Brevundimonas diminuta