scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.3k stars 111 forks source link

MapieRegressor sets method to 'base' from ACI #447

Closed dweprinz closed 5 months ago

dweprinz commented 6 months ago

Describe the bug I'm trying to implement ACI on a prefit regressor. However, it seems like the self.method is then assigned 'base', not allowing ACI to then update the conformity scores and alpha.

To Reproduce Steps to reproduce the behavior:

  1. Create a mapie_regressor = MapieTimeSeriesRegressor(estimator, method='aci', cv='prefit')
  2. Calibrate using mapie_regressor.fit(X_cal, y_cal)
  3. Look at mapie_regressor.method
  4. It is now 'base'
  5. In the mapie_regressor.update step I receive: ValueError: Invalid method. Allowed values are ['enbpi', 'aci'].

Expected behavior I would expect the method not to be changed to base.

Am I using this incorrectly? Otherwise it seems like this could be fixed by updating MapieRegressor._check_fit_parameters() to allow ACI to be prefit

thibaultcordier commented 6 months ago

Hello Derck! Thank you for reporting this bug.

TL;DR: You are using the class and its methods correctly. The problem is an side-effect that has not been taken into account (conflict between cv=prefit and method=aci). A patch is planned to fix this problem. See below for a temporary solution.

Confirmation of the bug

You have indeed identified a side-effect of MapieTimeSeriesRegressor when using cv=prefit: the method attribute is overloaded with the base value when the fit method is invoked.

This problem does not occur when using other cv objects (such as BlockBootstrap as used in this notebook).

Remedy

The solution proposed to avoid this side effect is to overload the method attribute with the value aci just after the call to the fit method:

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

from mapie.regression import MapieTimeSeriesRegressor

X, y = make_regression()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_train, X_cal, y_train, y_cal = train_test_split(X_train, y_train, test_size=0.5, random_state=42)

estimator = LinearRegression().fit(X_train, y_train)
mapie_regressor = MapieTimeSeriesRegressor(estimator, method='aci', cv='prefit')
mapie_regressor.fit(X_cal, y_cal)
# Add the following line to solve the problem
mapie_regressor.method = 'aci'
mapie_regressor.update(X_test, y_test, gamma=0.1, alpha=0.1)

This snippet of code works on my side. Check that it also works on your side.

Conclusion

The MapieTimeSeriesRegressor class inherits from MapieRegressor class. However, the MapieRegressor class overloads the method attribute when the fit method is called when cv=prefit. This can be quickly resolved using the previous solution (check that cv=prefit, in which case retain the value of the method attribute so as not to lose it). A patch is planned to fix this problem.

dweprinz commented 6 months ago

Hey Thibault! Thank you so much for your fast response! This works perfectly fine indeed.