This is a first attempt to refactor GroupedPredictor class (personally one of my favorite features 😁) following the issue raised in #616 .
While working on this I noticed another set of issues with the implementation:
Following #579, decision function will yield wrong results. This is due to the following fact: if the two groups are different classification problems, namely binary and multiclass (as it happens in the unittests), the result of .decision_function(...) will be a 1d or 2d array (resp.), which when concatenated lead to wrong behaviour.
This PR falls short in fixing that, yet I have a couple of ideas that needs test/triage
Consider the following example when multiple groups are present: let groups=["a", "b"] with values (0, 0), (0, 1) and (1, 0). Currently if at prediction time we encounter (a=0, b=2) for which we don't have a trained model, we are falling back to the global one. However I would argue that we should fallback to the model trained on a=0.
The PR implements this behaviour, however I would like to have support to decide how to implement it in a non-breaking manner
Type of change
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[X] Breaking change (fix or feature that would cause existing functionality to not work as expected)
Checklist:
[X] My code follows the style guidelines (flake8)
[ ] I have commented my code, particularly in hard-to-understand areas
[ ] I have made corresponding changes to the documentation (also to the readme.md)
[ ] I have added tests that prove my fix is effective or that my feature works
[ ] I have added tests to check whether the new feature adheres to the sklearn convention
[ ] New and existing unit tests pass locally with my changes
TODOs
[ ] The current tests will break due to some API changes
Description
This is a first attempt to refactor
GroupedPredictor
class (personally one of my favorite features 😁) following the issue raised in #616 .While working on this I noticed another set of issues with the implementation:
.decision_function(...)
will be a 1d or 2d array (resp.), which when concatenated lead to wrong behaviour.groups=["a", "b"]
with values(0, 0)
,(0, 1)
and(1, 0)
. Currently if at prediction time we encounter(a=0, b=2)
for which we don't have a trained model, we are falling back to the global one. However I would argue that we should fallback to the model trained ona=0
.Type of change
Checklist:
TODOs