yelabucsf / scrna-parameter-estimation

Direct estimation of mean and covariance from single cell RNA seq experiments
MIT License
77 stars 6 forks source link

Questions about differential variance (DV) #23

Open ofarrelle opened 10 months ago

ofarrelle commented 10 months ago

Hello! I'm working my way through the paper and some code my predecessor wrote and had some clarifying questions about the outputs of binary_test_1d. I am looking at differential expression comparing a group of control cells vs a group of treatment cells. Can I write down my understanding and could you let me know at which step it might break down?

  1. Using MME, we obtain ${\hat {\mu}}_{g, \text{ctrl}}$ , the mean the number of UMIs attributed to gene $g$ in an arbitrary control cell. Similarly we find ${\hat {\mu}}_{g, \text{trt}}$ for treatment cells. Comparing these 2 values gives us a metric similar to log fold change, which is labelled as $\text{de\_coef}_g$ in the results of binary_test_1d

  2. Bootstrapping allows us to repeatedly compute new values of $\hat {\text{de\_coef}_g}$ , which forms a distribution $\tilde {\text{de\_coef}_g}$. The standard error of this distribution is reported as $\text{de\_se}_g$ in the results of binary_test_1d. Then $\text{de\_pval}_g$ reports the p-value of the hypothesis test $\text{de\_coef}_g = 0$

  3. Steps 1 and 2 are repeated for the variance $\hat{ \sigma}^2_{g, \text{ctrl}}$ and $\hat{ \sigma}^2_{g, \text{trt}}$, which leads to the outputs dv_coef, dv_se, and dv_pval

I wanted to confirm that $\hat{ \sigma}^2_{g}$ values is not used in computing de_pval?

Additionally, could you please explain in layman's terms the importance of dv_coef? I believe this is the log Fold Change of the variance in UMI counts, comparing the control cells to the treatment cells. But it seems to be common practice to subset the results of binary_test_1d to only results with $\text{dv\_coef} > 0$. Why is there interest in increased variance?

I know this is a lot and I truly appreciate any input. Thank you!