scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.27k stars 102 forks source link

Support for non-BaseCrossValidator CV methods #518

Open nanophyto opened 1 week ago

nanophyto commented 1 week ago

I'm working with a highly zero-inflated dataset. Because of this, I'm using a custom defined zero-stratified CV splitter in my sklearn pipeline.

Currently, MAPIE does not seem to support non-BaseCrossValidator CV methods, which means that if I try to run e.g. MapieRegressor with cv=10 some of the folds are nearly all zeros - resulting in some very unrealistic lower bounds for the PI estimates.

Would it be possible to add support for user-defined CV splitters?

thibaultcordier commented 5 hours ago

Hello @nanophyto ,

Thank you for bringing this issue to our attention.

Your Issue

You're testing the MapieRegressor with cv=10 on a zero-inflated dataset and are dissatisfied with the results. You'd like to use a non-BaseCrossValidator, but the MAPIE package doesn't currently support this.

Clarifying Questions

Temporary Suggestion

To help you, consider using MapieQuantileRegressor to capture the heteroscedasticity of your data and obtain more realistic lower bounds for prediction intervals. Unlike MapieRegressor, which produces prediction intervals of constant size, MapieQuantileRegressor might provide more satisfying results.

Next Steps

Please share more details about your issue so we can better assist you. If you still need to use all your data in a cross-validation setup, provide more information about the properties of your non-BaseCrossValidator. This will help us determine if there are other solutions to adapt it to be BaseCrossValidator-compatible.

Looking forward to your response.