stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.64k stars 215 forks source link

Calculate confidence-interval for regression. #252

Closed satya-pattnaik closed 3 years ago

satya-pattnaik commented 3 years ago

Can we have a built in function to calculate confidence interval for regression.

X = load_data()
....
distribution = ngboost.pred_dist(X)
conf_int = scipy.stats.norm.interval(0.95, loc=distribution.loc, scale=distribution.scale)

This is obviously for normal distribution, but is there some way we can implement this for the most popular distributions, this will help.

creatornadiran commented 3 years ago

Can we use bootstrap for this?

satya-pattnaik commented 3 years ago

We can use bootstrap and that is one of the ideal solutions, but in this case we already have the conditional distribution information, SO why not use the intervals out of it.

creatornadiran commented 3 years ago

You are right, it makes sense

MikeOMa commented 3 years ago

I think this does work already for a subset of the distributions like so:

import numpy as np
import ngboost as ngb
X= np.random.randn(500,2)
y = np.random.randn(500)
model = ngb.api.NGBRegressor(n_estimators=5)
model.fit(X,y)
pred = model.pred_dist(X)
lower, upper = pred.interval(0.95)

Most distributions are implemented such that if the requested method (in this case 'interval') is not defined in dir(pred), it searches for the method in the scipy implementation of the distribution. So if the method is defined in scipy it will be defined for the output of model.pred_dist(X)

I think the multivariate normal is the only regression distribution that this does not work for but I could be wrong!