scverse / scvi-tools

Deep probabilistic analysis of single-cell and spatial omics data
http://scvi-tools.org/
BSD 3-Clause "New" or "Revised" License
1.25k stars 355 forks source link

feat: Add variance to ZINB model #3044

Closed rvinas closed 6 days ago

rvinas commented 1 week ago

Implemented variance of ZINB distribution

canergen commented 1 week ago

Thanks. Very appreciated. Do you have any link or math to clarify the computation?

canergen commented 1 week ago

@ori-kron-wis I checked it an it's correct. Can you add a release note and merge it after the minified PR.

codecov[bot] commented 6 days ago

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 84.53%. Comparing base (03acd60) to head (977d66b). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/scvi/distributions/_negative_binomial.py 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #3044 +/- ## ========================================== - Coverage 84.53% 84.53% -0.01% ========================================== Files 178 178 Lines 15061 15062 +1 ========================================== Hits 12732 12732 - Misses 2329 2330 +1 ``` | [Files with missing lines](https://app.codecov.io/gh/scverse/scvi-tools/pull/3044?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse) | Coverage Δ | | |---|---|---| | [src/scvi/distributions/\_negative\_binomial.py](https://app.codecov.io/gh/scverse/scvi-tools/pull/3044?src=pr&el=tree&filepath=src%2Fscvi%2Fdistributions%2F_negative_binomial.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scverse#diff-c3JjL3NjdmkvZGlzdHJpYnV0aW9ucy9fbmVnYXRpdmVfYmlub21pYWwucHk=) | `83.47% <0.00%> (-0.37%)` | :arrow_down: |

🚨 Try these New Features:

rvinas commented 6 days ago

Thanks. Very appreciated. Do you have any link or math to clarify the computation?

@canergen @ori-kron-wis Apologies for the delay and thank you for accepting my pull request!

I used the following NB parameterization:

$$ f_{NB}(x; r, p) = \frac{(x+r-1)!}{x!(r-1)!} (1-p)^x p^r $$

The ZINB PMF is:

$$ f{ZINB}(x; r, p, \pi) = (x == 0)\big(\pi + (1 - \pi) f{NB}(x; r, p)\big) + (x > 0) (1 - \pi) f_{NB}(x; r, p) $$


To calculate the ZINB variance, I wrote $V[x]$ as follows

$$ V[x] = E[x²] - E[x]² = E[x(x-1)] + E[x] - E[x]² $$

and then computed $E[x]$ (the mean) and $E[x(x-1)]$.


First, here is how to calculate the mean $E[x]$ of a ZINB:

$$ \begin{align} E[x] &= \sum{x=0}^{\infty} x \cdot f{ZINB}(x; r, p, \pi) \ &= (1 - \pi) \sum_{x=1}^{\infty} x \cdot \frac{(x+r-1)!}{x!(r-1)!} (1-p)^x p^r \end{align} $$

To compute this, one can use a beautiful reparameterization trick. We proceed as follows:

$$ \begin{align} E[x] &= (1 - \pi) \sum_{x=1}^{\infty} x \cdot r \cdot \frac{((x-1)+(r + 1) -1)!}{x(x-1)! r!} (1-p) (1-p)^{(x-1)} \frac{p^{(r+1)}}{p} \ \end{align} $$

After simplifying and taking factors out of the sum we get the following:

$$ \begin{align} E[x] &= (1 - \pi)\frac{r (1-p)}{p} \sum_{x=1}^{\infty} \frac{((x-1)+(r + 1) -1)!}{(x-1)! r!} (1-p)^{(x-1)} p^{(r+1)} \ \end{align} $$

If we use the following reparameterization $\bar{x} = x - 1$ and $\bar{r}= r + 1$, we obtain:

$$ \begin{align} E[x] &= (1 - \pi)\frac{r (1-p)}{p} \sum_{\bar{x}=0}^{\infty} \frac{(\bar{x}+\bar{r} -1)!}{\bar{x}! (\bar{r}-1)!} (1-p)^{\bar{x}} p^{\bar{r}} \ \end{align} $$

And we know that the sum can be rewritten as $\sum{\bar{x}=0}^{\infty} f{NB}(\bar{x}; \bar{r}, p) = 1$. This leads to

$$ E[x] = (1 - \pi)\frac{r (1-p)}{p} $$


To calculate the expectation $E[x(x-1)]$, we can use exactly the same reparameterization technique (applied twice in a row) and we'll get the following:

$$ E[x(x-1)] = (1 - \pi)\frac{r (r+1) (1-p)²}{p²} $$


After putting all the terms together we get:

$$ V[x] = (1 - \pi) \frac{r(1-p)(1+r\pi - r\pi p)}{p²} $$

and using the NB parameterization ($\theta = r$ and $\mu=\frac{\theta (1-p)}{p}$) employed in scvi-tools, we get:

$$ V[x] = (1 - \pi) \cdot \mu \cdot \frac{(\mu + \theta + \pi \mu \theta)}{\theta} $$