satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
212 stars 33 forks source link

Understanding the purpose of corrected counts #179

Closed danielcgingerich closed 10 months ago

danielcgingerich commented 10 months ago

Trying to understand what exactly corrected counts are. I do not see how the corrected counts are different than the raw count matrix. Ive looked at the source code for correct_counts in this repository and identify the source of my misunderstanding. from line 237 in denoise.R

mu <- exp(tcrossprod(coefs, regressor_data_orig))
variance <- mu + mu^2 / theta
y <- as.matrix(umi[genes_bin, , drop=FALSE])
pearson_residual <- (y - mu) / sqrt(variance)
# generate output
mu <- exp(tcrossprod(coefs, regressor_data))
variance <- mu + mu^2 / theta
y.res <- mu + pearson_residual * sqrt(variance)

mu + pearson_residual sqrt(variance) = mu + (y - mu) / sqrt(variance) sqrt(variance) = mu + y - mu = y.

Whats the purpose of this?

saketkc commented 10 months ago

This is explained in the paper. Briefly, the idea is to put all the cells to same sequencing depth (median) and then ask what would be observed counts (with minimal technical variation or mostly biological variance) given the model captures technical variation.

danielcgingerich commented 5 months ago

@saketkc Thank you!