Closed FBruzzesi closed 8 months ago
Shrinkage doesn't really make sense, or at least not for all transformers. In fact the current GroupedTransformer doesn't support it.
Agree!
Should the group columns be returned as they are from the .transform(X) operation?
It feels like it would be fine to default to not returning the group columns. Mainly because users could select them seperately in the pipeline if they were so inclined. Might make for a good parameter though.
Should the .transform(X) output maintain the input type? Hence if pandas dataframe maintain the index as well?
I would prefer not to rely on anything pandas specific with the advent of possible polars support. Is there a use-case you had in mind where that's required?
Ok, these will definitely be (at least) two separate PRs, one for predictor and one for transformer (the predictor one could land late this week 😁)
Might make for a good parameter though
"passthrough" vs "drop" seems suitable
Is there a use-case you had in mind where that's required?
Absolutely not, I started to rely on indexes as less as possible, but since pandas and polars input/output are supported by scikit-learn, I am wondering what is the behavior one should expect.
Description
As noted in #616 and discussed on private channels,
GroupedPredictor
class has its shortcomings.I am opening this issue to keep track on how we could expand from what we learned. Here a list of features that
HierarchicalPredictor
andHierarchicalTransformer
should have in my opinion:groups = ["a", "b"]
, groups to fit are:["global"], ["global", "a"], ["global", "a", "b"]
"next"
or"raise"
, meaning:"next"
: if a group value is not found at prediction/transformation time, it fallbacks to the first available group in the hierarchy."raise"
: if a group value is not found at prediction/transformation time, an error is raised."parent"
?sample_weight
), which is currently not possible.A first draft of the implementation was developed in #618, however trying to keep the current behavior of
GroupedPredictor
as well as expanding its functionalities was going to become a headache both to develop and maintain. This is why an implementation from scratch would make sense.Edit
A few considerations regarding the hypothetical
HierarchicalTransformer
:GroupedTransformer
doesn't support it..transform(X)
operation?.transform(X)
output maintain the input type? Hence if pandas dataframe maintain the index as well?