satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Understanding the purpose of corrected counts #179

Closed danielcgingerich closed 5 months ago

danielcgingerich commented 5 months ago

Trying to understand what exactly corrected counts are. I do not see how the corrected counts are different than the raw count matrix. Ive looked at the source code for correct_counts in this repository and identify the source of my misunderstanding. from line 237 in denoise.R

mu <- exp(tcrossprod(coefs, regressor_data_orig))
variance <- mu + mu^2 / theta
y <- as.matrix(umi[genes_bin, , drop=FALSE])
pearson_residual <- (y - mu) / sqrt(variance)
# generate output
mu <- exp(tcrossprod(coefs, regressor_data))
variance <- mu + mu^2 / theta
y.res <- mu + pearson_residual * sqrt(variance)

mu + pearson_residual sqrt(variance) = mu + (y - mu) / sqrt(variance) sqrt(variance) = mu + y - mu = y.

Whats the purpose of this?

saketkc commented 5 months ago

This is explained in the paper. Briefly, the idea is to put all the cells to same sequencing depth (median) and then ask what would be observed counts (with minimal technical variation or mostly biological variance) given the model captures technical variation.

danielcgingerich commented 1 month ago

@saketkc Thank you!