stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 215 forks source link

Very large prediction intervals for Exponential distribution #206

Closed albertotonon closed 3 years ago

albertotonon commented 3 years ago

Thanks a lot to all devs for making this very useful library available to everyone.

I'm considering using this lib in my job to predict the car market values to be able to provide users with a range instead of a point estimation of the value of their car.

By cross-validating I found that using the exponential distribution gives best results (with a large margin), also, using the normal distribution sometimes made training not converge. The problem I'm facing is that, using such distribution, the 95% confidence intervals are very large, for example

>>> exp = stats.expon(scale=10000)
>>> exp.interval(0.95)
(253.17807984289897, 36888.79454113935)

Basically, in my case the predictions are ok but the intervals are really not usable. Is there anything I could do?

alejandroschuler commented 3 years ago

Hmm, I don't think there's a way around this. As your example demonstrates, the exponential distribution only has one parameter. So if you want to predict a mean of 1/x, the variance will necessarily be 1/x^2.

The answer is to use a distribution with parameters such that the mean and variance can vary independently (like the normal). I'm not sure what has caused your convergence issues, but have you tried shrinking the learning rate by an order of magnitude or two?

albertotonon commented 3 years ago

Thanks, @alejandroschuler , I'll give it a shot.