scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

EnbPI in MAPIE #474

Closed valeman closed 3 days ago

valeman commented 4 days ago

What classes of estimators does EnbPI in MAPIE works with?

The tutorial mentions RandomForest, the EnbPI model as such as published in paper is not limited to bagging estimators and it can work with any model.

Is there a gap in implementation vs the model in the paper?

If so, it would be good to have EnbPI work with any regression model classes including boosted trees (CatBoost/XGBoost/LightGBM) and scikit-learn regressors.

thibaultcordier commented 4 days ago

MAPIE works for all sklearn-compatible estimator classes. To this end, EnbPI already works with any regression model classes including boosted trees (CatBoost/XGBoost/LightGBM) and scikit-learn regressors.

The time series tutorial (https://mapie.readthedocs.io/en/latest/examples_regression/4-tutorials/plot_ts-tutorial.html) mentions RandomForest as an illustration but is not restricted to this class. You can experiment by adapting line 158 of the tutorial with one of the estimators you have just mentioned.

from sklearn.linear_model import LinearRegression
model = LinearRegression()
# or
from sklearn.ensemble import AdaBoostRegressor
model = AdaBoostRegressor()
# or
from sklearn.ensemble import GradientBoostingRegressor
model = GradientBoostingRegressor()
# or
import lightgbm as lgb
model = lgb.sklearn.LGBMRegressor()
# or
from xgboost import XGBRegressor
model = XGBRegressor()

In conclusion, MAPIE is not limited to bagging estimators, it can work with any model and there are no gap in the implementation compared with the model presented in the article.

valeman commented 1 day ago

Thank you