Closed FBruzzesi closed 8 months ago
Forgot to mention a couple of ideas:
estimators_
attribute is something like the following dictionary:
{
(1,): LogisticRegression(),
(1, 'A'): LogisticRegression(),
(1, 'B'): LogisticRegression(),
(1, 'A', 'X'): LogisticRegression(),
(1, 'B', 'Y'): LogisticRegression(),
}
which is ok-ish, but not super interpretable. One idea could be to have a namedtuple:
from collections import namedtuple
groups, grp_values = ["col1", "col2"], [1,2]
estimator = namedtuple("estimator", groups)
estimator(*grp_values)
# estimator(col1=1, col2=2)
This hits edge cases if a group has a "_" prefix (maybe we can trim that), but the end result would be:
{
estimator(sklego_global_estimator=1): LogisticRegression(),
estimator(sklego_global_estimator=1, col1="A"): LogisticRegression(),
estimator(sklego_global_estimator=1, col1="B"): LogisticRegression(),
estimator(sklego_global_estimator=1, col1="A", col2="X"): LogisticRegression(),
estimator(sklego_global_estimator=1, col1="B", col2="Y"): LogisticRegression(),
}
dataclasses would also work similarly.
Possibility of accessing an estimator from its dictionary directly, namely being able to do hierarchical_predictor[(1, "A", "X")]
in place of hierarchical_predictor.estimators_[(1, "A", "X")]
Related to both the above, as adding the 1 for the global model is not very ergonomic, we can create shortcuts by aliasing externally what is in the first point as:
{
"__sklego_global_estimator__": LogisticRegression(),
('A', ): LogisticRegression(),
('B', ): LogisticRegression(),
('A', 'X'): LogisticRegression(),
('B', 'Y'): LogisticRegression(),
}
and all the associated "conversions" from the first and/or second point, if accepted
At the moment I'd be fine with this:
{
(1,): LogisticRegression(),
(1, 'A'): LogisticRegression(),
(1, 'B'): LogisticRegression(),
(1, 'A', 'X'): LogisticRegression(),
(1, 'B', 'Y'): LogisticRegression(),
}
But it would be preferable to have this documented in the code with comments. I agree that it's merely "ok", but it feels clear enough if the comment is there, no?
But it would be preferable to have this documented in the code with comments. I agree that it's merely "ok", but it feels clear enough if the comment is there, no?
Just added a few comments on those. Here is how it looks like:
Ah yeah, that's even nicer. Just in the comments would've been sufficient but adding it in the docs for sure is a nice touch.
My main observation is that we might want to add an extra test that uses a dummy model to help predict the values that we'd expect. Other than that; this looks great! Good work :)
I am thinking out loud here but...maybe the easiest way to check this is to use a deterministic/fake predictions model. Does that sound reasonable?
Oh that was totally in line with what I had in mind. I've used Dummy models for this in the past but you're also free to pick another method. As long as we just have a test that our assumptions on how we shrink play out.
Description
HierarchicalPredictor
,HierarchicalClassifier
andHierarchicalRegressor
- Partially fixes #620ShrinkageMixin
to abstract such shared functionalities fromHierarchicalPredictor
andGroupedPredictor
equal_shrinkage
Type of change
Checklist: