scikit-learn-contrib / category_encoders

A library of sklearn compatible categorical variable encoders
http://contrib.scikit-learn.org/category_encoders/
BSD 3-Clause "New" or "Revised" License
2.41k stars 396 forks source link

get_feature_names_out is incompatible with sklearn estimators and eli5, consequently #408

Closed DZIMDZEM closed 1 year ago

DZIMDZEM commented 1 year ago

Expected Behavior

In BaseEncoder, get_feature_names_out() should accept more than 1 argument as in other sklearn base estimators.

def get_feature_names_out(self, input_features=None):
      """
      ...
      """
      return _check_feature_names_in(self, input_features)

Actual Behavior

BaseEncoder's get_feature_names_out() accepts only 1 argument: self. It makes it incompatible with eli5 module and other modules that work with feature names when you use sklearn modules.

def get_feature_names_out(self) -> List[str]:
        """..."""
        if not isinstance(self.feature_names_out_, list):
            raise NotFittedError("Estimator has to be fitted to return feature names.")
        else:
            return self.feature_names_out_

Steps to Reproduce the Problem

  1. Add input_features keyword argument to get_feature_names_out.
  2. Copy/inherit _check_feature_names_in method from sklearn.utils.validation so get_feature_names_out has the same implementation as sklearn.base.BaseEstimator

As a temporary solution you can just override the method. Example for TargetEncoder:

class TargetEncoderFixed(TargetEncoder):
        def get_feature_names_out(self, *arg, **kargs):
            return self.feature_names_out_

Specifications

PaulWestenthanner commented 1 year ago

I think this was fixed by #398 Could you please confirm this? So if you check the current master branch the get_feature_names_out function already supports the input_features argument. I haven't built a release for this bugfix yet though, so if you install form pypi you should still experience this problem. I can build a release this week though if it solves your problem

DZIMDZEM commented 1 year ago

@PaulWestenthanner , yes, it resolves the problem. Thank you for the fast response!

PaulWestenthanner commented 1 year ago

Version 2.6.1 is published on pypi now