zjdaye / MDSeq

MDSeq: Gene expression mean and variability analysis for RNA-seq counts
Other
6 stars 3 forks source link

Calculation of normalised counts #3

Open aedanr opened 4 years ago

aedanr commented 4 years ago

Hi, I think there is an error in the calculation of normalised counts using normalize.counts() when using TMM or RLE.

The function gets normalisation factors using edgeR's calcNormFactors() and divides the counts by the resulting factors, but the factors returned by calcNormFactors should be applied to library sizes, not directly to counts. I think that the factors (nf) calculated in the vignette to be used as offsets are the factors that should be used to directly normalise counts, i.e.:

cnf <- calcNormFactors(dat.filtered, method="TMM")
libsize <- colSums(dat.filtered)
rellibsize <- libsize/exp(mean(log(libsize)))
nf <- cnf * rellibsize

In normalize.counts(), these could be obtained as:

y$samples$norm.factors * y$samples$lib.size / exp(mean(log(y$samples$lib.size))) rather than just using y$samples$norm.factors.

Please let me know if I've misinterpreted something here, but I think this is right. Thanks.

zjdaye commented 4 years ago

Hi Aedan,

Thanks for your comment! We used the simple convention of normalizing with normalized count = raw count / normalization factor that had been used by earlier authors. I had discussed your comment with my co-author Di Ran. His view is that one might correct the normalization factor with library size by ef <- lib.size * norm.factors # effective library size that is on the scale of count nf.final <- ef/ exp(mean(log(ef))) # normalize around 1 and replace norm.factors with nf.final for normalization.

Hope this is helpful!

Best, John.