satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
206 stars 33 forks source link

Conceptual check-in: Avg Pearson residual as gene set score? #64

Closed dpcook closed 4 years ago

dpcook commented 4 years ago

Hey Christoph,

Quick question. I'm just thinking about gene set scoring strategies. As an alternative to AddModuleScore's approach to scoring, do you think it makes sense to make a score from a cells average Pearson residuals for the gene set? ie. colMeans(seurat[["SCT"]][gene_set,])

I did a quick comparison on a data set and it actually looks pretty consistent with AddModuleScore(): image

satijalab commented 4 years ago

Thats an interesting point. Pearson residuals are basically asking how much of an 'outlier' a cell is for a given gene, given the average expression of that gene in the population. I never thought about it, but AddModuleScore is doing a similar thing (without an underlying statistical model). Great to see that they correlate, and I think it certainly makes sense

dpcook commented 4 years ago

Thanks! Yeah, from my understanding of the methods, I thought they conceptually addressed the same question, but with different approaches. Using SCT for the rest of the analysis, I liked the idea of using the same model for generic gene set scoring, but wanted to make sure I wasn't missing anything obvious.