scikit-learn-contrib / MAPIE

A scikit-learn-compatible module to estimate prediction intervals and control risks based on conformal predictions.
https://mapie.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.2k stars 99 forks source link

Allow `MapieRegressor` to use K-fold iterator variants with stratification and groups. #393

Closed pidefrem closed 5 months ago

pidefrem commented 6 months ago

Description

Continuing the work done in PR Allow the use of GroupKFold cv-split (see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html) and also the use of custom cv-splits based on StratifiedKFold for example (see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold)

Fixes #202

Type of change

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist

thibaultcordier commented 6 months ago

Hello @pidefrem ! Thank you for this contribution. I'll take the time to review it. Just to be in line with your proposal, I understand that in Allow MapieRegressor to use group split strategy, the split strategy refers to the way of doing cross-validation but that it is a cross conformal method.

Don't hesitate to contact me if you need any help.

pidefrem commented 6 months ago

Hello @thibaultcordier, yes it refers to the split methods of the cross validator used during the fit of the MAPIE estimators. Please feel free to suggest any other description that you think is more suitable.

codecov-commenter commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (614293e) 100.00% compared to head (b9f25fa) 100.00%. Report is 210 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #393 +/- ## ========================================== Coverage 100.00% 100.00% ========================================== Files 39 39 Lines 4616 4887 +271 Branches 487 803 +316 ========================================== + Hits 4616 4887 +271 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

pidefrem commented 5 months ago

Wonderful! Thank you for this very welcome contribution. I don't have many comments to add but just a few to complete your proposal exhaustively:

  • some style code suggestions
  • As far as the tests are concerned, you are indeed testing whether MAPIE returns the same results using the same group (constant or None). But what happens if you use different groups like np.concat([np.ones(shape=n_samples/2), 2*np.ones(shape=n_samples/2)]) (untested proposal).
  • duplicate the tests in the classification test file because changes have been made in this part of the code.

@thibaultcordier I fixed some issues and added some tests, tell me if it is ok now.

thibaultcordier commented 5 months ago

Wonderful! Thank you for this very welcome contribution. I don't have many comments to add but just a few to complete your proposal exhaustively:

  • some style code suggestions
  • As far as the tests are concerned, you are indeed testing whether MAPIE returns the same results using the same group (constant or None). But what happens if you use different groups like np.concat([np.ones(shape=n_samples/2), 2*np.ones(shape=n_samples/2)]) (untested proposal).
  • duplicate the tests in the classification test file because changes have been made in this part of the code.

@thibaultcordier I fixed some issues and added some tests, tell me if it is ok now.

Hello @pidefrem, I'll check your changes this week. Thank you for contacting me about the review. I'll keep you informed.