stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.65k stars 215 forks source link

Implement Poisson Distribution for NGBRegressor #81

Closed isaacanthony closed 4 years ago

isaacanthony commented 4 years ago

It would be really useful for count data if we could use the Poisson distribution with NGBRegressor, i.e.

from ngboost.distns import Poisson
...
ngb = NGBRegressor(Dist=Poisson).fit(X_train, Y_train)

Let me know if you have any questions. Thank you!

alejandroschuler commented 4 years ago

Couldn't agree more!

Are you interested in trying to implement this yourself? I've written a guide on how to do it at the end of our vignette and I would love to have it test-driven. I'm more than happy to provide support or clarify points wherever needed, and it would help us improve our documentation.

JMBurley commented 4 years ago

Thanks @alejandroschuler, I'm looking into implementing this with @isaacanthony and was wondering the following after reading the vignette:

  1. Can you define how the NGBoost “dynamically creates a new class” and what we need to do to ensure that works re: When a user asks NGBoost to use the Laplace distribution with the LogScore, NGBoost will first find the implementation of the log score that is compatible with Laplace, i.e. LaplaceLogScore and dynamically create a new class that has both the attributes of the distribution and the appropriate implementation of the score.

  2. Is there an example of creating custom Riemannian Metrics if not using an existing scoring class?

PS. Thanks for writing the vignette! Helped clarify a lot of the implementation questions I had.

alejandroschuler commented 4 years ago

Awesome, I'm really happy you're giving this a shot! Besides making this distribution available for use, it's going to accelerate the whole NGBoost project since it will serve as another worked example of implementing a new distribution and will improve the documentation. It's a huge contribution, so thanks again for that.

As for your questions:

  1. you don't have to do anything- we designed it so that this just works :) As long as your distribution has a scores class attribute defined that contains an appropriate implementation of the score (e.g. PoissonLogScore) which subclasses the appropriately-named score (e.g. LogScore), ngboost will be able to find and link it in the right way. If you want to look under the hood and see how it's done you can check out the implementation() method in distn.py and manifold.py. It's not terribly complicated- only 15 or so lines of code altogether.

  2. There isn't an example of that right now because doing that is above the pay grade of most users. I'll go through your options from least complex to most complex to demonstrate what I mean:

If you want to use the log score, which is the default thing most people are familiar with as negative log-likelihood, the easiest thing to do as a developer is to lean on the default ngboost method that calculates the metric with monte carlo estimates.

If you want to make it faster, then what you have to do is derive the Riemannian metric, which for the log score is the Fisher information matrix. You have to derive the Fisher with respect to the internal ngboost parameterization (if that is different to the user-facing parametrization, e.g. log(sigma), not sigma). Deriving a Fisher is not necessarily easy since you have to compute an expectation analytically, but there are lots of examples of deriving Fisher matrices online that you can look through.

If you don't want to use the log score (say you want CRP score, for example), then ngboost does not (yet?) have a default method for calculating the metric and you must implement it yourself. This is harder than deriving a Fisher because there are not really many worked examples. The most general derivation process should follow the outline here, replacing the KL divergence (which is induced by the log score) with whichever divergence is induced by the scoring rule you want to use (e.g. L2 for CRPS), again taking care to derive with respect to the internal ngboost parameterization, not the user-facing one. For any particular score, there may be a specific closed-form expression that you can use to calculate the metric across distributions (the expression for the Fisher Info serves this purpose for the log score) or there may not be- I actually don't know the answer to this question! But if there were, that could suggest some kind of default implementation for that score's metric() method.

alejandroschuler commented 4 years ago

Just an FYI that I've split off the material at the end of the vignette into a separate developer guide and updated it with points from the discussion here.

JMBurley commented 4 years ago

Great, thanks Alejandro. That all makes sense.

Long-term I'd love to implement an extra scoring metric, but will hold off on that pending solution of the simpler problem of Poisson distribution...

alejandroschuler commented 4 years ago

Great, lmk if you get stuck anywhere!

paulbutera commented 4 years ago

Hello! Is this still being looked into?

I had a play with it but got a bit confused. The poisson implementation in scipy is a bit different to laplace - no fit method. I think i got around this, obtaining mu with a minimization function. Got really confused trying to implement d_score.

Way over my head. Any help greatly appreciated.

tonyduan commented 4 years ago

@paulbutera Can you make a pull request with your code if that's available?

JMBurley commented 4 years ago

Isaac and I ended up looking at negative binomial (rather than Poisson). We got most of it implemented in a short session but were held up by robustly defining the gradient of a discrete distribution.

That said, I now have a continuous approximation of negative binomial dist. in hand so I think we can push it through when we have time [COVID-19 is not helping with our availability...]

alejandroschuler commented 4 years ago

Hmm, @JMBurley @isaacanthony the discrete nature of the distribution shouldn't be a problem. Remember that the gradient is of the score w.r.t. the parameter(s), which are continuous. For example, here's our implementation of the categorical distribution. This distribution is certainly discrete, but its K-1 parameters (the log odds of being in each class) are continuous. For a simpler example, you can check out our old code for the Bernoulli distribution, which is generalized by the categorical distribution. That code was written before the package got refactored to make it easier to develop, but hopefully you can still see how it would translate over.

alejandroschuler commented 4 years ago

@JMBurley @isaacanthony @paulbutera see #143 ! If any of you are willing to provide a review for that PR I'd appreciate another set of eyes.