scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.27k stars 102 forks source link

MapieQuantileRegressor with RandomForestQuantileRegressor from sklearn_quantile #409

Closed alessiasarica closed 8 months ago

alessiasarica commented 8 months ago

Hi guys, I'm trying to fit a MapieQuantileRegressor with RandomForestQuantileRegressor from sklearn_quantile. Please, consider that I'm a researcher of Applied ML on Neuroscience, not a programmer.

In detail:

  1. I built separately 3 RandomForestQuantileRegressor with three quartile: 0.05, 0.95 and 0.5 and hyperparameter tuning
q1 = 0.05
q2 = 0.95
q3 = 0.5

neg_mean_pinball_loss_q1_scorer = make_scorer(
    mean_pinball_loss,
    alpha=q1,
    greater_is_better=False,  # maximize the negative loss
)

neg_mean_pinball_loss_q2_scorer = make_scorer(
    mean_pinball_loss,
    alpha=q2,
    greater_is_better=False,  # maximize the negative loss
)

neg_mean_pinball_loss_q3_scorer = make_scorer(
    mean_pinball_loss,
    alpha=q3,
    greater_is_better=False,  # maximize the negative loss
)

rfqr_q1= RandomForestQuantileRegressor(q=q1)
random_search_rfqr_q1=RandomizedSearchCV(estimator=rfqr_q1, param_distributions=param_grid, cv=5, n_iter=5,random_state=seed,scoring=neg_mean_pinball_loss_q1_scorer)
random_search_rfqr_q1.fit(X_train, y_train)
random_search_rfqr_q1.best_score_

rfqr_q1=RandomForestQuantileRegressor(q=q1,
**random_search_rfqr_q1.best_params_, random_state=seed)
rfqr_q1.fit(X_train, y_train)

rfqr_q2= RandomForestQuantileRegressor(q=q2)
random_search_rfqr_q2=RandomizedSearchCV(estimator=rfqr_q2, param_distributions=param_grid, cv=5, n_iter=5,random_state=seed,scoring=neg_mean_pinball_loss_q2_scorer)
random_search_rfqr_q2.fit(X_train, y_train)
random_search_rfqr_q2.best_score_

rfqr_q2=RandomForestQuantileRegressor(q=q2,
**random_search_rfqr_q2.best_params_, random_state=seed)
rfqr_q2.fit(X_train, y_train)

rfqr_q3= RandomForestQuantileRegressor(q=q3)
random_search_rfqr_q3=RandomizedSearchCV(estimator=rfqr_q3, param_distributions=param_grid, cv=5, n_iter=5,random_state=seed,scoring=neg_mean_pinball_loss_q3_scorer)
random_search_rfqr_q3.fit(X_train, y_train)
random_search_rfqr_q3.best_score_

rfqr_q3=RandomForestQuantileRegressor(q=q3,
**random_search_rfqr_q3.best_params_, random_state=seed)
rfqr_q3.fit(X_train, y_train)
  1. I created a list with the three models to give it to MapieQuantileRegressor

    models_rfqr = []
    models_rfqr.append(rfqr_q1)
    models_rfqr.append(rfqr_q2)
    models_rfqr.append(rfqr_q3)
    cqr = MapieQuantileRegressor(estimator=models_rfqr, alpha=0.1)
  2. I fitted the MapieQuantileRegressor and I have: ValueError: Invalid estimator. Please provide a regressor with fit and predict methods. cqr.fit(X_calib, y_calib)

I updated the quantile_estimator_params:

new_estimator={'RandomForestQuantileRegressor': {'loss_name':'criterion','alpha_name': 'q'}}
MapieQuantileRegressor().quantile_estimator_params.update(new_estimator)
MapieQuantileRegressor().quantile_estimator_params

Is there a problem of compatibility between MAPIE and sklearn_quantile?

Thanks, Alessia

alessiasarica commented 8 months ago

I solved simply adding cv="prefit"

cqr = MapieQuantileRegressor(estimator=models_rfqr, alpha=alpha_q, cv="prefit")
cqr.fit(X_calib, y_calib)
LacombeLouis commented 7 months ago

Hey @alessiasarica, Indeed, you can use it in this way. If you wish to perform the split inside of MAPIE, I refer you back to this issue #199. Thank you!