motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
144 stars 24 forks source link

Know how much of an environment is covered #81

Closed SilasK closed 10 months ago

SilasK commented 2 years ago

I have another mOTU question. I’ve seen that you extended the mOTU quite a bit for different environments. However, I would use mOTU on new environments e.g. soil. How can I know that a high fraction of species are covered in the mOTU. It seems to me that one one side you have the mOTUs from ref and metagenomes. But is there a way to know how much I don’t know.

If I would use a Kraken mapping obviously I won't get more species but at least I have an idea of the mapping rate.

SilasK commented 2 years ago

Answer from @hjruscheweyh

Getting an idea/rate of what you have missed is always a hard task for tools calculating taxonomic abundance. mOTUs addresses this with the unassigned (previously -1) mOTU in which we have all marker genes for which we have no taxonomic information or no linkage to other marker genes. Version 3 has updated this cluster also massively. We basically added all marker genes that we assembled and that were not represented by a MAG or a mOTU to the unassigned cluster. The unassigned cluster should give you an estimate of how much you miss even though you have no clue what you miss. Considering that you work with a well covered environment this estimate will be closer to the truth. And you’re absolutely right. You have to be more careful when working with environments that are either 1. Incredible complex (hard to assemble, hard to bin) 2. and/or have no much representation in the database. This holds true for soil. We have assembled quite a number of metagenomes and the number of MAGs that we got out was quite unimpressive (The number of unassigned marker genes was relatively comparable though). A few  things that you can do if you want to dig deeper into a specific environment  and are unsure how well it is covered in mOTUs:

  1. Assemble the metagenomes and extract the marker genes (fetchmgs) and compare the marker genes directly against the mOTUs db. This should give you a good picture of what is missing in the mOTUs database.
  2. Extend the database with your own MAGs (Extender for mOTUs3 will be released next week)
  3. Assemble your metagenomes and get the marker genes. Use one of the marker genes as described here: https://www.nature.com/articles/s41564-017-0008-3#Sec1

Hope that helps. Im happy to discuss further if you have additional ideas.

SilasK commented 2 years ago

Thank you very much for your answer.

matthpich commented 2 years ago

Hi, Thanks for the great tool. @SilasK did you find "Extender for mOTUSs3" mentioned by @hjruscheweyh? I would like to predict the 10 marker genes in a metagenomic dataset and thought it could help. Many thanks!

AlessioMilanese commented 2 years ago

Hi @matthpich,

if you want to predict the 10 marker genes from a metagenomic sample (hence a fastq file with reads), then the extender will not help you. The extender need as input MAGs, hence already assembled and binned reads.

matthpich commented 2 years ago

Dear @AlessioMilanese, Thanks for the prompt answer. I assembled the reads already and predicted genes/proteins, but I do not have MAGs. So, I am thinking that the latest versions of fetchMG or emapper could help identify if any of the marker genes in thesample is missing in the mOTU db, and therefore if mOTUs reflect the content of my sample accurately. Am I correct?

AlessioMilanese commented 2 years ago

Yes, correct!

Tou can use fetchMG: https://github.com/motu-tool/fetchMGs

to predict the 10 MGs from you matching genes/protein files. And then you can use vsearch to compare to the mOTUs DB.

matthpich commented 2 years ago

Perfect, thanks @AlessioMilanese.

Jibowe commented 2 years ago

Answer from @hjruscheweyh

Getting an idea/rate of what you have missed is always a hard task for tools calculating taxonomic abundance. mOTUs addresses this with the unassigned (previously -1) mOTU in which we have all marker genes for which we have no taxonomic information or no linkage to other marker genes. Version 3 has updated this cluster also massively. We basically added all marker genes that we assembled and that were not represented by a MAG or a mOTU to the unassigned cluster. The unassigned cluster should give you an estimate of how much you miss even though you have no clue what you miss. Considering that you work with a well covered environment this estimate will be closer to the truth. And you’re absolutely right. You have to be more careful when working with environments that are either 1. Incredible complex (hard to assemble, hard to bin) 2. and/or have no much representation in the database. This holds true for soil. We have assembled quite a number of metagenomes and the number of MAGs that we got out was quite unimpressive (The number of unassigned marker genes was relatively comparable though). A few  things that you can do if you want to dig deeper into a specific environment  and are unsure how well it is covered in mOTUs:

  1. Assemble the metagenomes and extract the marker genes (fetchmgs) and compare the marker genes directly against the mOTUs db. This should give you a good picture of what is missing in the mOTUs database.
  2. Extend the database with your own MAGs (Extender for mOTUs3 will be released next week)
  3. Assemble your metagenomes and get the marker genes. Use one of the marker genes as described here: https://www.nature.com/articles/s41564-017-0008-3#Sec1

Hope that helps. Im happy to discuss further if you have additional ideas.

Hi, Where is the URL link for Extender for mOTUs3?I can't find it online. I need to install it to complete my data analysis work. Many thanks!

AlessioMilanese commented 1 year ago

Here is the link for the extender: https://github.com/motu-tool/mOTUs-extender