Are the rRNA gene sequences removed?

jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis

GNU General Public License v3.0

373 stars 80 forks source link

Are the rRNA gene sequences removed? #712

Closed yjiakang closed 1 year ago

yjiakang commented 1 year ago

Hi, thanks for your excellent pipeline. I am wondering if the rRNA gene sequences are removed? The data is metatranscriptomic sequences. If yes, is all the rRNA removed or just 16S rRNA sequences been removed? Thanks for your time in replying this question.

fpusan commented 1 year ago

They are not removed, if assembled they will be present in the results. rRNA sequences (and other ORFs not encoding proteins) will be shown as "no_CDS" in the figures from SQMtools.

yjiakang commented 1 year ago

Thanks for your quick reply. Another question, is the taxonomy inferred by the 16S rRNA both in metagenomics and metatranscriptomics? If so, the metatranscriptomics that only enriches mRNA is not suitable for this pipeline?

fpusan commented 1 year ago

The taxonomy is actually inferred from protein-conding genes, not 16S rRNA genes (though we also classify the 16S sequences, the result is not further used). So SqueezezMeta works perfectly with mRNAs.

yjiakang commented 1 year ago

That is great. Thanks a lot.

Panda-smile commented 8 months ago

您好，这个可以跑宏转录组吗？可以指导下如何执行脚本吗？

jtamames commented 8 months ago

是的，您可以像分析宏基因组一样分析宏转录组。查阅手册，看看程序是如何运行的

chrismitbiz commented 6 months ago

The taxonomy is actually inferred from protein-conding genes, not 16S rRNA genes (though we also classify the 16S sequences, the result is not further used). So SqueezezMeta works perfectly with mRNAs.

Hi! I was wondering why 16S rRNA genes are predicted if they are not further used? Thanks!

fpusan commented 6 months ago

Hi! This is mostly because rRNA genes do not assemble reliably from short metagenomics reads. 16S is very conserved, meaning that different organisms will have relatively similar sequences. This can lead to the assembler collapsing several taxa into a single contig. So while we predict them and do some basic annotation (since it does not take much time) we do not further use them as sources of taxonomic annotation in our pipeline.