motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
145 stars 24 forks source link

only 10 marker genes in mOTU_version2 #19

Closed Jigyasa3 closed 5 years ago

Jigyasa3 commented 5 years ago

I was wondering, if it is possible to get the remaining 30 marker genes of mOTU_version2?

AlessioMilanese commented 5 years ago

Hi @Jigyasa3, I'm not sure I understood the question:

Jigyasa3 commented 5 years ago

Hey Alessio Thanks for replying. I would like to have the sequences of other 30 MGs.

AlessioMilanese commented 5 years ago

I think you can find it in Progenomes (http://progenomes.embl.de/). But, since it might be not so easy, I've added the genes to a ftp server: https://www.embl.de/download/zeller/progenomes_40MGs.tar.gz (205 MB).

Jigyasa3 commented 5 years ago

Thank you so much! I really appreciate all the help! I have one more question. The marker genes from progenomes only correspond to whole genomes, and not the metagenomes. Is it possible to get the same from metagenomes?

AlessioMilanese commented 5 years ago

From the metagenomes we extracted only the 10 MGs, hence there are not 40.

See from the original mOTUs paper:

We assessed the suitability of each of the 40 marker genes for microbial composition profiling at the species level based on the false discovery rate (FDR) for their identification in reference genomes and the accuracy of their respective mOTUs in species-level profiling (Supplementary Table 4 and Online Methods). From this analysis, we selected the ten best-performing marker genes, which had an average FDR of 1.4% (range, 0.1%–3.8%) and a mean ambiguous read alignment rate of 3.5% (range, 0.9%–6.4%; Supplementary Table 4 and Online Methods).

Jigyasa3 commented 5 years ago

Dear @AlessioMilanese

Sorry to bother you again. I am not able to find the full names of refMG ids. For example- refMG0000000.COG0552_2 corresponds to which bacterial genome in progenomes.tar.gz file?

AlessioMilanese commented 5 years ago

Maybe this file will solve your problem: https://zenodo.org/record/2635425#.XQjFrXtS_UI

Jigyasa3 commented 5 years ago

thanks for the link!

This link gives- "refMG0003438.COG0018 COG0018.mOTU.v2.0005149 ref_mOTU_v2_1116" But I want to connect the refMG0003438.COG0018 information to ">652616.PRJDB66.ERDMAN_1444 <annotation product=arginyl-tRNA synthetase " from progenomes.tar.gz file. Is it possible?

AlessioMilanese commented 5 years ago

The gene ids contain:

>652616.PRJDB66.ERDMAN_1444 <annotation product=arginyl-tRNA synthetase 

where 652616.PRJDB66 is taxonomy_id . project_id. From this, you can map to the ref-mOTUs (see attached genome_to_specI.tsv.zip).

I do not have a map from gene -> MGC -> ref-mOTUs, because the MGCs are not defined for the marker genes that do not belong to the 10 selected. And in a way, you don't need it, because the MGCs and ref-mOTUs have the same information.

For example:

ref_mOTU_v2_0001    1410657.PRJNA223467
ref_mOTU_v2_0001    1410658.PRJNA223501
ref_mOTU_v2_0001    1410659.PRJNA223465
ref_mOTU_v2_0002    1000565.PRJNA64645
ref_mOTU_v2_0002    983952.PRJNA81137
ref_mOTU_v2_0003    1000568.PRJNA64689

then: 1410657.PRJNA223467-COG0012+1410658.PRJNA223501-COG0012+1410659.PRJNA223465-COG0012 form the MGC for COG0012 for ref_mOTU_v2_0001.