myles-lewis / glmmSeq

Gene-level general linear mixed model
https://myles-lewis.github.io/glmmSeq/
Other
18 stars 10 forks source link

sizeFactors instructions in vignette #24

Closed zktuong closed 2 years ago

zktuong commented 2 years ago

hi,

in your vignette under size factors,

when using:

sizeFactors <- calcNormFactors(counts, method="TMM")

the sizeFactor should have an additional step like:

sizeFactors <- dgelist$samples$lib.size * dgelist$samples$norm.factors

because in the calcNormFactors details, they explicitly state:

This function computes scaling factors to convert observed library sizes into effective library sizes. The effective library sizes for use in downstream analysis are lib.size * norm.factors where lib.size contains the original library sizes and norm.factors is the vector of scaling factors computed by this function.

https://rdrr.io/bioc/edgeR/man/calcNormFactors.html

related to #20

myles-lewis commented 2 years ago

Hi Kelvin, Thanks for the comment. Just to be clear: while the sizeFactors are often stated/estimated as total library size (i.e. sum of all counts in a sample), for fitting the glmer model it's important that we use the normFactors directly - we need them to be close to 1 for most samples. sizeFactors are passed into the model as offset = log(sizeFactors). Large values for sizeFactors tends to cause problems with fitting the model. DESeq2 does the same thing: its estimateSizeFactors function centres the rows on 1. So the additional step of multiplying by lib.size is not required and in fact it's better not to inflate the sizeFactors. Myles