sktime / skpro

A unified framework for tabular probabilistic regression and probability distributions in python
https://skpro.readthedocs.io/en/latest
BSD 3-Clause "New" or "Revised" License
231 stars 45 forks source link

[ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor #423

Open fkiraly opened 2 months ago

fkiraly commented 2 months ago

We should try to interface TweedieRegressor from sklearn as an skpro regressor. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.TweedieRegressor.html

Notes on implementation:

FYI @ShreeshaM07, this is very similar to your previous work on statsmodels GLM!

ShreeshaM07 commented 2 months ago

Some points regarding the same

A doubt regarding the TweedieRegressor, is it not just an interface to possible regressors for different families for ex Poisson,Gaussian,Gamma ? So then is there any difference in implementing the TweedieRegressor if it is just going to expose these different regressors ?

fkiraly commented 2 months ago

To answer these:

is it not just an interface to possible regressors for different families for ex Poisson,Gaussian,Gamma

yes, but for non-integer p parameter these are very specific families that are also not available yet. It is a good question whether the distribution should internally decompose in these case distinctions.

ShreeshaM07 commented 2 months ago

this scipy issue discusses the Tweedie distribution: https://github.com/scipy/scipy/issues/11291#issuecomment-1868256070 and concludes that the scipy interface is not general enough because it is mixed type. skpro is general enough, so with the pointers in there we could implement it, either entirely from scratch, or interfacing some of the component functions such as Bessel.

From the conversation I can infer that we can implement this in skpro as it allows for mixed type distributions with pdf and pmf in different intervals. https://lorentzen.ch/index.php/2024/06/17/a-tweedie-trilogy-part-iii-from-wrights-generalized-bessel-function-to-tweedies-compound-poisson-distribution/ seems to be a very informative post explaining the Tweedie distribution. It also gives code snippet for the pdf and pmf of the function compound poisson and gamma function.

import numpy as np
from scipy.special import wright_bessel

def cpg_pmf(mu, phi, p):
    """Compound Poisson Gamma point mass at zero."""
    return np.exp(-np.power(mu, 2 - p) / (phi * (2 - p)))

def cpg_pdf(x, mu, phi, p):
    """Compound Poisson Gamma pdf."""
    if not (1 < p < 2):
        raise ValueError("1 < p < 2 required")
    theta = np.power(mu, 1 - p) / (1 - p)
    kappa = np.power(mu, 2 - p) / (2 - p)
    alpha = (2 - p) / (1 - p)
    t = ((p - 1) * phi / x)**alpha
    t /= (2 - p) * phi
    a = 1 / x * wright_bessel(-alpha, 0, t)
    return a * np.exp((x * theta - kappa) / phi)

This can be utilized along with the usage of the wright_bessel function in scipy.special.

for the sklearn Tweedie regressor, the remaining quesiton is still where to get the scale from. It would not be much of a Tweedie regressor if tha twould be impossible to obtain...

I think there is a very round about way to do this by passing the x value to PoissonRegressor and GammaRegressor separately and finding out the values of lambda,a and b. image As we know the mean=return of predict we know p power parameter is fixed. We can calculate phi or scale using the formula below . Is it not possible that way?

ShreeshaM07 commented 2 months ago

Some thought on the Tweedie Distribution

fkiraly commented 2 months ago

From the conversation I can infer that we can implement this in skpro as it allows for mixed type distributions with pdf and pmf in different intervals.

Yes, assuming you mean the p parameter. In places where the distribution is entirely discrete or continuous, the pdf or pmf will return zero.

Further, here's an interesting option, since multiple already implemented distributions figure as special cases:

Here is an illustration of the suggested delegator approach: image (Tweedie is a delegator compound of Tweedie ED families)

fkiraly commented 2 months ago

Opened new issue on Tweedie distribution here, as that does not seem too straightforward - for further discussion. https://github.com/sktime/skpro/issues/429