transcript / samsa2

SAMSA pipeline, version 2.0. An open-source metatranscriptomics pipeline for analyzing microbiome data, built around DIAMOND and customizable reference databases.
GNU General Public License v3.0
53 stars 36 forks source link

Sequencing depth differs between experimental and control #73

Closed mweberr closed 2 years ago

mweberr commented 2 years ago

Dear all, I have an issue with the sequencing depth of my metatranscriptomic samples: 5 samples control group with on average 300 million reads 5 samples treated group with on average 60 million reads

The DESEQ2 package calculates sizeFactors to partially correct for the different number of SNPs, but still I see a lot of potentially falsely negatively regulated. Because the treated group samples just has less number of reads. What would you do to make the samples more similar ? Do you have experience with subsampling samples ?

Best, Michael

transcript commented 2 years ago

Hey Michael,

Ah, normalization - a tricky subject, because there are a number of different ways to do it. In general, DESeq2 recommends using non-normalized data (https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts), but you should be able to get normalized counts out of the DESeqDataSet by using the results function, as described here: https://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#p-values-and-adjusted-p-values

You could start by looking at the results export of the dds (DESeqDataSet) and seeing whether the normalized values are also negative when looking at controls compared to the experimental samples?

I've not tried subsampling, although you could try asking on a forum like BioStars to see if anyone has a preferred method for doing so with RNAseq comparison experiments.

Best, Sam