satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
208 stars 33 forks source link

Mean and variance of normalized gene expression #54

Open jmbreda opened 4 years ago

jmbreda commented 4 years ago

Hi,

Thanks for this nice and convenient tool.

Is there a good way to estimate the "true" mean and variance of genes with sctransform?

I assume that calculating the mean directly on the residuals won't work, as the Pearson residuals correspond to a difference between expected UMI count (free of seq. depth) and actual count? Should I then take the mean of the expected count \mu{ij}? if so, should I compute \mu{ij} from the parameters in [vst_object]$model_pars_fit?

Concerning the variance, should I use sctransform::get_model_var rather than calculating the variance directly on the residuals?

Best, Jeremie

ChristophH commented 4 years ago

Hi Jeremie,

How exactly would you define "true" mean and variance? If you are looking for mean and variance on the scale of original counts with the effect of sequencing depth removed, you could use the return_corrected_umi parameter and use the umi_corrected matrix directly. This will give you the mean and variance at median sequencing depth.

If you are wondering about the mean and variance in the scenario where every cell was sequenced to saturation, then this is not going to give you the answer. The sctransform approach with log_umi as technical factor does not extrapolate to values outside the input range. I am not aware of a tool that would do this and I am not sure it's even possible.