added Gamma distribution

eco3 commented 1 year ago

Added Gamma distribution with the PDF

$$ f(y;\alpha, \beta) = \frac{1}{\Gamma(\alpha)} \frac{(\beta y)^\alpha}{y} \mathrm{e}^{-\beta y}. $$

I used SciPys gamma distribution for this implementation. Both parameters $\eta = (\alpha, \beta) > 0$. Therefore I reparametrized them to $\theta = (\log \alpha, \log \beta)$.

For the LogScore I calculated the derivatives with respect to log a and log b.

$$ \begin{align} \frac{\partial f}{\partial \alpha} &= \alpha(\psi^{(0)}(\alpha) - \log(\beta\ y)) \ \frac{\partial f}{\partial \beta} &= \beta y - \alpha \end{align} $$

With $\psi^{(m)}(z)$ being the polygamma function of order m. SciPy offers this function. $\psi^{(0)}(z)$ is the digamma function, which is also supported by SciPy.

The Fisher information matrix of the gamma distribution is

$$ I_{\eta}(\eta) = \begin{pmatrix} \psi^{(1)}(\alpha) & -\beta^{-1}\ -\beta^{-1} & \alpha \beta^{-2} \end{pmatrix}. $$

The reparameterisation procedure yields the following matrix:

$$ \begin{align} I{\theta}(\theta) &= J^{-1}\ I{\eta}(\eta)\ J^{-1} \ I{\theta}(\theta) &= \begin{pmatrix} \alpha & 0\ 0 & \beta \end{pmatrix} \begin{pmatrix} \psi^{(1)}(\alpha) & -\beta^{-1}\ -\beta^{-1} & \alpha \beta^{-2} \end{pmatrix} \begin{pmatrix} \alpha & 0\ 0 & \beta \end{pmatrix} \ I{\theta}(\theta) &= \begin{pmatrix} \alpha^2\ \psi^{(1)}(\alpha) & -\alpha\ -\alpha & \alpha \end{pmatrix} \end{align} $$

I'm no expert in maths, somebody should doublecheck my calculations. I added the distribution to the tests, not sure if I missed something. The pytests passed on my machine.

ryan-wolbeck commented 1 year ago

@alejandroschuler can you review the math here and I'll focus on implementation?

alejandroschuler commented 1 year ago

@eco3 thanks for the contribution! Could you use wolframalpha.com to verify the derivations please?

eco3 commented 1 year ago

@eco3 thanks for the contribution! Could you use wolframalpha.com to verify the derivations please?

I've already posted them in the original post, here they are: df/da and df/db. I then substituted $e^a$ and $e^b$ with $e^{\log a}$ and $e^{\log b}$. This yields: df/da substituted and df/db substituted.

alejandroschuler commented 1 year ago

@eco3 oh, so sorry about that, I totally missed the links :)

ryan-wolbeck commented 1 year ago

@eco3 can you please run black code formatting on the PR https://black.readthedocs.io/en/stable/usage_and_configuration/the_basics.html That should take care of the failed builds

eco3 commented 1 year ago

@eco3 can you please run black code formatting on the PR https://black.readthedocs.io/en/stable/usage_and_configuration/the_basics.html That should take care of the failed builds

After running black for formatting, there are still some failed checks, see here.

Is this related to this issue?

ryan-wolbeck commented 1 year ago

@eco3 that looks like a likely culprit to me, I'll take a deeper look at this during this week and get back to you (thinking likely just bump the version of black to latest)

ryan-wolbeck commented 1 year ago

@eco3 we are working on #296 that should fix this issue. Once it gets merged you'll need to pull the new master to your repo and merge it in here, then run black again (with the proper version locally) and it should pass

stanfordmlgroup / ngboost

added Gamma distribution #295