Closed Jigyasa3 closed 5 years ago
Hi @Jigyasa3, I'm not sure I understood the question:
Hey Alessio Thanks for replying. I would like to have the sequences of other 30 MGs.
I think you can find it in Progenomes (http://progenomes.embl.de/). But, since it might be not so easy, I've added the genes to a ftp server: https://www.embl.de/download/zeller/progenomes_40MGs.tar.gz (205 MB).
Thank you so much! I really appreciate all the help! I have one more question. The marker genes from progenomes only correspond to whole genomes, and not the metagenomes. Is it possible to get the same from metagenomes?
From the metagenomes we extracted only the 10 MGs, hence there are not 40.
See from the original mOTUs paper:
We assessed the suitability of each of the 40 marker genes for microbial composition profiling at the species level based on the false discovery rate (FDR) for their identification in reference genomes and the accuracy of their respective mOTUs in species-level profiling (Supplementary Table 4 and Online Methods). From this analysis, we selected the ten best-performing marker genes, which had an average FDR of 1.4% (range, 0.1%–3.8%) and a mean ambiguous read alignment rate of 3.5% (range, 0.9%–6.4%; Supplementary Table 4 and Online Methods).
Dear @AlessioMilanese
Sorry to bother you again. I am not able to find the full names of refMG ids. For example- refMG0000000.COG0552_2 corresponds to which bacterial genome in progenomes.tar.gz file?
Maybe this file will solve your problem: https://zenodo.org/record/2635425#.XQjFrXtS_UI
thanks for the link!
This link gives- "refMG0003438.COG0018 COG0018.mOTU.v2.0005149 ref_mOTU_v2_1116" But I want to connect the refMG0003438.COG0018 information to ">652616.PRJDB66.ERDMAN_1444 <annotation product=arginyl-tRNA synthetase " from progenomes.tar.gz file. Is it possible?
The gene ids contain:
>652616.PRJDB66.ERDMAN_1444 <annotation product=arginyl-tRNA synthetase
where 652616.PRJDB66
is taxonomy_id
.
project_id
.
From this, you can map to the ref-mOTUs (see attached genome_to_specI.tsv.zip).
I do not have a map from gene -> MGC -> ref-mOTUs, because the MGCs are not defined for the marker genes that do not belong to the 10 selected. And in a way, you don't need it, because the MGCs and ref-mOTUs have the same information.
For example:
ref_mOTU_v2_0001 1410657.PRJNA223467
ref_mOTU_v2_0001 1410658.PRJNA223501
ref_mOTU_v2_0001 1410659.PRJNA223465
ref_mOTU_v2_0002 1000565.PRJNA64645
ref_mOTU_v2_0002 983952.PRJNA81137
ref_mOTU_v2_0003 1000568.PRJNA64689
then:
1410657.PRJNA223467-COG0012
+1410658.PRJNA223501-COG0012
+1410659.PRJNA223465-COG0012
form the MGC for COG0012 for ref_mOTU_v2_0001.
I was wondering, if it is possible to get the remaining 30 marker genes of mOTU_version2?