scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.3k stars 111 forks source link

Can MAPIE handle a regressor build using stacking approach? #226

Closed ivan-marroquin closed 1 year ago

ivan-marroquin commented 2 years ago

Hi,

Many thanks for making available this great package!

I built a regressor pipeline following the staking approach in http://rasbt.github.io/mlxtend/user_guide/regressor/StackingRegressor/

My stacking model is organized as follows:

gboost= xgb.XGBRegressor(objective= 'reg:squarederror', booster= 'gblinear')

ridge= Ridge(alpha= 0.25)

linear= LinearRegression()

sgd= SGDRegressor(alpha= 0.25)

lasso= Lasso(alpha= 0.25)

elastic= ElasticNet(alpha= 0.25)

base_models= (ridge, sgd, lasso, linear, gboost, elastic)

meta_model= xgb.XGBRegressor(n_estimators= 100, objective= 'reg:squarederror', booster= 'gbtree')

With this type of regressor, will MAPIE able to handle it? If so, Do you have suggestions on how to design the class to support both the stacking and MAPIE computations?

Best regards, Ivan

ivan-marroquin commented 2 years ago

Hi,

My class is defined as follows:

class MyEstimator(BaseEstimator, RegressorMixin, TransformerMixin):

def __init__(self, parameters, base_models, meta_model):
    # set of linear regressor models
    self.base_models= base_models 

    # gradient boost decision tree regressor model
    self.parameters= parameters

    self.meta_model= meta_model

    # use 5 folds for training base regressor models
    self.n_folds= 5

def get_params(self, deep= True):
    return super().get_params(deep)

# train base regressor models
def fit(self, train_inputs, train_targets, validation_inputs, validation_targets):
    # code to train "base_models" and "meta_model"
    return self

 def predict(self, validation_inputs):
    # code to predict used to trained "base_modesl" and "meta_model"
    return meta_predictions

I am using MAPIE 0.5.0

I examined the Python code for regression (https://github.com/scikit-learn-contrib/MAPIE/blob/master/mapie/regression.py). It seems to me that I have a conflict in the way how I defined "fit" versus the "fit" defined the MAPIE.

If I try the following: A) mapie_regressor.fit(gral_train_inputs, gral_train_targets.reshape(-1,)) # to match MAPIE's "fit"

File "C:\Temp\Python_3.8.10\lib\site-packages\mapie\regression.py", line 562, in fit self.singleestimator = fit_estimator( File "C:\Temp\Python_3.8.10\lib\site-packages\mapie\utils.py", line 119, in fit_estimator estimator.fit(X, y) TypeError: fit() missing 2 required positional arguments: 'validation_inputs' and 'validation_targets'

B) mapie_regressor.fit(gral_train_inputs, gral_train_targets.reshape(-1,), test_inputs, test_targets.reshape(-1,)) # to match MyEstimator's "fit" TypeError: fit() takes from 3 to 4 positional arguments but 5 were given

Do you have a suggestion?

Ivan

vincentblot28 commented 1 year ago

Hello @ivan-marroquin, could you please provide a reproducible example so we can help you fix this issue ?

thibaultcordier commented 1 year ago

I propose to close this issue due to the lack of activity and because this item is not planned.

ivan-marroquin commented 1 year ago

In case you may be still interested in giving it a try, you could use the stacking regressor class implemented by MXLENTD https://github.com/rasbt/mlxtend/blob/master/mlxtend/regressor/stacking_regression.py

I used this approach to develop my own class which combines xgboost (as the meta-regressor) with 5 linear regressors.