Problem Description
Probatus feature elimination (e.g. ShapRFECV) currently does not allow for cross-validation objects which take groups variables (e.g. StratifiedGroupKFold)
Desired Outcome
It would be great if this feature could be implemented as those groups can be used to prevent data leakage in (e.g.) the case where multiple samples from the same customer are available and therefore should be either only in the training or the test set but not in both.
Solution Outline
The fix to this should be quite simple and can follow the implementation of scikit-learn's RFECV: One would need to add a groups variable (default: None) to the fit/fit_compute methods of ShapRFECV and pass it through to self.cv.split
Problem Description Probatus feature elimination (e.g.
ShapRFECV
) currently does not allow for cross-validation objects which takegroups
variables (e.g. StratifiedGroupKFold)Desired Outcome It would be great if this feature could be implemented as those
groups
can be used to prevent data leakage in (e.g.) the case where multiple samples from the same customer are available and therefore should be either only in the training or the test set but not in both.Solution Outline The fix to this should be quite simple and can follow the implementation of
scikit-learn
's RFECV: One would need to add agroups
variable (default:None
) to thefit
/fit_compute
methods ofShapRFECV
and pass it through to self.cv.split