ing-bank / probatus

Validation (like Recursive Feature Elimination for SHAP) of (multiclass) classifiers & regressors and data used to develop them.
https://ing-bank.github.io/probatus
MIT License
132 stars 40 forks source link

Allow for group-based cross-validation objects from scikit-learn #181

Closed PaulZhutovsky closed 2 years ago

PaulZhutovsky commented 2 years ago

Problem Description Probatus feature elimination (e.g. ShapRFECV) currently does not allow for cross-validation objects which take groups variables (e.g. StratifiedGroupKFold)

Desired Outcome It would be great if this feature could be implemented as those groups can be used to prevent data leakage in (e.g.) the case where multiple samples from the same customer are available and therefore should be either only in the training or the test set but not in both.

Solution Outline The fix to this should be quite simple and can follow the implementation of scikit-learn's RFECV: One would need to add a groups variable (default: None) to the fit/fit_compute methods of ShapRFECV and pass it through to self.cv.split