stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.63k stars 214 forks source link

Adding Beta distribution #277

Open Fish-Soup opened 2 years ago

Fish-Soup commented 2 years ago

I would like to add a Beta distribution. Scipy Beta distibution I've had little look at the code and it seems I can do this myself without requiring the package to be updated. Is there a guide or any advice in how to set this up? Two possible issues 1) Beta distribution has 2 parameters that are both positive only. 2) if both parameters are less than 1 the pdf looks a bit like a parabola with values at 0 and 1 having high probabilities, but values inbetween having lower one's. Its seems to me this would make it look very similar two having 2 populations one with high probability at 0 and the other at 1. Thus it seems like a good idea to be able to limit allowed values for the parameter to greater than 1.

Many thanks.

alejandroschuler commented 2 years ago

Hey @Fish-Soup, we'd love to have the beta distribution added, so if you implement this yourself feel free to make a pull request and we can get it in the package.

As for guidance, I recommend looking at the developer guide. Hopefully that will answer most of your questions, but I'll also address what you've written here.

In ngboost, all distributions need to be parametrized with a finite number of parameters taking values in the real numbers. That means that, at first glance, there's no way to restrict the value of a parameter to be positive, negative, between 0 and 1, etc. However, it's actually not too difficult to get around that. The trick is to use a continuous 1:1 transformation that maps from the domain of interest (e.g. positive reals [0, ∞)) to the reals (-∞, ∞). So, for example, our implementation of the normal distribution uses two parameters, μ and log(σ). This implicitly restricts σ ∈ [0, ∞) using the continuous transformation log: [0, ∞) -> (-∞, ∞). Hopefully that makes sense!

StatMixedML commented 2 years ago

@Fish-Soup You might want to use XGBoostLSS that supports estimating the Beta distribution

https://github.com/StatMixedML/XGBoostLSS