thelovelab / tximport

Transcript quantification import for modular pipelines
136 stars 33 forks source link

filtering lowly expreed genes after or before normalization ? #22

Closed atasub closed 6 years ago

atasub commented 6 years ago

Hi, I need to figure out which approach is more appropriate regarding filtering lowly expressed genes. According to tximport manual, it is recommended to follow following commands for EdgeR analysis: library(edgeR)

cts <- txi$counts normMat <- txi$length normMat <- normMat/exp(rowMeans(log(normMat))) library(edgeR) o <- log(calcNormFactors(cts/normMat)) + log(colSums(cts/normMat)) y <- DGEList(cts) y$offset <- t(t(log(normMat)) + o)

and to continue with y as a DGE object. In my analysis I filtered out the lowly expressed genes based on the cpm value (for instance, cpm value is greater than 1 in at least the number of small group of samples) using "keep.lib.sizes=FALSE" after doing above mentioned normalization. I am now confused if my approach is appropriate and if I should do the normalization after filtering?

Thanks for your help. Best,

mikelove commented 6 years ago

hi, please see this note:

https://github.com/mikelove/tximport/issues/19