koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 117 forks source link

[FEATURE] Enable GroupedTransformer to work with .set_output(transform="pandas") #696

Closed olivier-s-j closed 3 months ago

olivier-s-j commented 3 months ago

In Scikit-learn the set_output API was recently introduced: https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_set_output.html

However, unless I am mistaken, Scikit-lego is not yet compatible with this. It would be nice to have this compatibility.

I came across this incompatibility when creating:

ColumnTransformer(
            transformers=[
                (
                    "cat",
                    OrdinalEncoder(
                        handle_unknown="use_encoded_value", unknown_value=-1
                    ),
                    categorical_features,
                ),
                (
                    "cont",
                    GroupedTransformer(
                        StandardScaler(), groups=["GROEP_A", "GROEP_B"]
                    ),
                    continuous_features
                ),
            ],
            verbose_feature_names_out=False,
        ).set_output(transform="pandas")

Which results in:

ValueError: Unable to configure output for GroupedTransformer(groups=['GROEP_A', 'GROEP_B'],
                   transformer=StandardScaler()) because `set_output` is not available.
FBruzzesi commented 3 months ago

Hey there! Thanks for reporting this issue. This should certainly be possible and it should be enough to implement get_feature_names_out (as explained in the official doc section).

@olivier-s-j please let us know if you would be interested in opening a PR tackling GroupedTransformer.

@koaning this could be a nice addition and a quick win for most transformers honestly.