motu-tool / mOTUs

motus - a tool for marker gene-based OTU (mOTU) profiling
GNU General Public License v3.0
145 stars 24 forks source link

which CDS is meta_12476? #54

Closed zckoo007 closed 3 years ago

zckoo007 commented 4 years ago

Hi all,

I used your excellent software call for taxnomy, and I found that there is an unknown bacteria particularly high in my case samples.(meta_mOTU_v25_12476). Then I went to binning, and I assembled 100 high-quality bins. I want to find out which bins is this unknown bacteria。 when I type the following code, I found more than one CDS, so which CDS is meta_12476? would you please tell me how to match my bins to this CDS?

$ grep "_12476" db_mOTU_DB_CEN.fasta.annotations
1678    meta_mOTU_v25_12476.metaMG0012618_COG0012   meta_mOTU_v25_12476.metaMG0012618_COG0012   CDS <annotation product= NaN /> 1098    101 1198    +               0   0NaN
6932    meta_mOTU_v25_12476.metaMG0007119_COG0016   meta_mOTU_v25_12476.metaMG0007119_COG0016   CDS <annotation product= NaN /> 1044    101 1144    +               0   0NaN
12049   meta_mOTU_v25_12476.metaMG0008193_COG0018   meta_mOTU_v25_12476.metaMG0008193_COG0018   CDS <annotation product= NaN /> 1770    101 1870    +               0   0NaN
17935   meta_mOTU_v25_12476.metaMG0016878_COG0172   meta_mOTU_v25_12476.metaMG0016878_COG0172   CDS <annotation product= NaN /> 1278    101 1378    +               0   0NaN
23435   meta_mOTU_v25_12476.metaMG0014264_COG0215   meta_mOTU_v25_12476.metaMG0014264_COG0215   CDS <annotation product= NaN /> 1410    101 1510    +               0   0NaN
27296   meta_mOTU_v25_12476.metaMG0005812_COG0495   meta_mOTU_v25_12476.metaMG0005812_COG0495   CDS <annotation product= NaN /> 2679    101 2779    +               0   0NaN
34167   meta_mOTU_v25_12476.metaMG0000504_COG0533   meta_mOTU_v25_12476.metaMG0000504_COG0533   CDS <annotation product= NaN /> 1170    101 1270    +               0   0NaN
40810   meta_mOTU_v25_12476.metaMG0006705_COG0541   meta_mOTU_v25_12476.metaMG0006705_COG0541   CDS <annotation product= NaN /> 1368    101 1468    +               0   0NaN
45570   meta_mOTU_v25_12476.metaMG0011040_COG0552   meta_mOTU_v25_12476.metaMG0011040_COG0552   CDS <annotation product= NaN /> 951 101 1051    +               0   0NaN
AlessioMilanese commented 4 years ago

Hi @zckoo007,

Thanks for your interest in mOTUs.

The meta-mOTU that you are looking for has 9 marker genes, which are the one listed by your grep command. If you look at db_mOTU_DB_CEN.fasta you will find the fasta sequence of these genes.

Then I went to binning, and I assembled 100 high-quality bins. I want to find out which bins is this unknown bacteria

I would try to find these 9 genes in the 100 bins that you assembled. Maybe vsearch should do the trick. And the percentage identity should be >96%.

Maybe there are better methods than vsearch, like predict the genes from the bins and then compare them to the 9 genes. But maybe it is just extra-work that is not needed.

Also, it might be easier if you first link the bins to create some Metagenome Assembled Genomes (MAGs) and then try to identify the genes. This is important if the bins are relatively small. For example, if your genome is composed of 20 bins, you can only identify 9 with the previous method (since there are only 9 marker genes).

In case you have a MAG (but probably also on contigs it's fine), you can use this tool to extract the 10 marker genes: https://github.com/AlessioMilanese/classify-genomes

Use the command: classify-genomes <fasta_file> -m marker_gene_seq.fasta

In marker_gene_seq.fasta there should be the sequences of the marker genes identified in the contig/MAG.

Hope it makes sense?

zckoo007 commented 4 years ago

Yes, very helpful!!!

zckoo007 commented 4 years ago

Does capitalization affect the result? image image

AlessioMilanese commented 4 years ago

No, but if you want to be sure you can transform all to uppercase letters. It might be that some tools treat differently uppercase and lower case letters.