scikit-learn-contrib / sklearn-pandas

Pandas integration with sklearn
Other
2.81k stars 413 forks source link

Column naming: compatibility with OneHotEncoder #241

Open stacymiller opened 3 years ago

stacymiller commented 3 years ago

In sklearn v0.24.1 OneHotEncoder transformer exposes derived names in the categories_ attribute. Can we add one more check to https://github.com/scikit-learn-contrib/sklearn-pandas/blob/e84274643369fc6f75ca4b1b08824e188e96cd3f/sklearn_pandas/dataframe_mapper.py#L40 to cover this case?

ragrawal commented 3 years ago

Sure, can you create a MR and add a unit test. I will be happy to merge it.

falcaopetri commented 3 years ago

The categories_ attribute does not represent the derived feature names. It actually contains The categories of each feature determined during fitting, see OneHotEncoder.categories_).

Nonetheless, sklearn 1.0 transformer's get_output_names is getting deprecated in favor of get_feature_names_out. More info in PR #248.