stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 214 forks source link

Poisson branch #143

Closed btatkinson closed 4 years ago

btatkinson commented 4 years ago

Caveat: I don't have much experience with this. However, I think I was able to implement Poisson regression. I used Wolfram for the derivation. It seems to work pretty well.

Feel free to correct me on any of this.

The power of NGBoost seems to be in the predictive uncertainty. Unfortunately that advantage lost when doing a Poisson regression, since the uncertainty of a Poisson distribution is fixed. For that reason I am very interested in either modeling the underlying uncertainty in the mean (https://en.wikipedia.org/wiki/Poisson_distribution#Confidence_interval) or moving on to a Gamma or Tweedie distribution. I think those paths offer a lot of promise in my applications.

alejandroschuler commented 4 years ago

Thanks for the contribution @btatkinson! I'll have a look shortly.

Regarding your comments about predictive uncertainty- I wouldn't say the uncertainty of a poisson distribution is fixed. Rather, I'd say that the variance and the mean are linked. If you predict a high average Y for a given X, you also predict high uncertainty in that Y, and vice-versa.

If you haven't already, I recommend checking out this blog post and the related questions that have been asked here: #76 #133. Might be useful to you in formulating approaches to tackle your application.

alejandroschuler commented 4 years ago

Just a heads up that I'm still planning on reviewing this but I've been too busy to get to it so far.

btatkinson commented 4 years ago

Thank you @JMBurley I fully agree with all of your suggestions. I should've brought up the alternative method of fitting via a Github comment rather than commented out code. I'll wait on @alejandroschuler or another repo author to comment before I make any changes there.

alejandroschuler commented 4 years ago

@btatkinson looks good to me! Sorry it took me a while to get around to it. My only question is about the initial fitting- I left a comment to that effect in the code. Not sure why we can't initialize with the mean, as you mention in your comment.

I can approve this as-is, but I think it would also be great to add an .ipynb or .py script to the examples folder demonstrating the use of this distribution on a relevant dataset. Would you mind putting that together and committing? I feel like it's relevant to this PR so best to include here. I will also update the docs to include the poisson after we've got an example.

PS @JMBurley thanks for being a champ and doing such a thorough code review!

ryan-wolbeck commented 4 years ago

@btatkinson as part of this PR I'd suggest adding the distribution to the test file as well https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/tests/test_distns.py

alejandroschuler commented 4 years ago

@btatkinson (bump)

btatkinson commented 4 years ago

Sorry, got caught up in other stuff. I'll get this taken care of

alejandroschuler commented 4 years ago

Sorry, got caught up in other stuff. I'll get this taken care of

no worries :) thanks again so much for your contribution!

alejandroschuler commented 4 years ago

@btatkinson very excited to get this merged! I know at least one other user (@maximilianpfau) is eagerly awaiting the addition :)

alejandroschuler commented 4 years ago

Looks good to me, thanks again @btatkinson !