scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
459 stars 122 forks source link

Incompatible with RFECV #189

Open jcrudy opened 5 years ago

jcrudy commented 5 years ago

The Earth model seems to be unusable as the estimator in RFECV. See https://github.com/scikit-learn-contrib/py-earth/issues/188#issuecomment-442088727.

hdinh commented 5 years ago

The issue is that Earth's coef_ is in the shape of pruned basis, whereas RFECV expects the classifier to have coef_ the shape of num_features.

RFECV will also inspect feature_importances_ if it's available. So using Earth with any of sklearn's featureselection APIs could work if `coefwas renamed to something likebasiscoefalong with computingfeatureimportances`

https://github.com/scikit-learn/scikit-learn/blob/ab2f539a32b8099a941cefc598c9625e830ecfe4/sklearn/feature_selection/rfe.py#L186

For example, this hacky proof of concept could work for an example similar to https://github.com/scikit-learn-contrib/py-earth/issues/188#issuecomment-442088727

from pyearth import Earth
from sklearn.datasets.samples_generator import make_regression
from sklearn.feature_selection import RFECV

class EarthWrapper(object):
    def __init__(self, **kwargs):
        self.earth = Earth(**kwargs)

    def __getstate__(self):
        return self.__dict__

    def __setstate__(self, d):
        self.__dict__ = d

    def __getattr__(self, name):
        if name == 'coef_':
            raise AttributeError # hide from RFECV so that it uses feature_importances_
        elif name == 'earth':
            return self.earth
        elif hasattr(self.earth, name):
            return getattr(self.earth, name)
        raise AttributeError

X, y = make_regression(n_features=2)

model = RFECV(
    estimator=EarthWrapper(
        feature_importance_type='gcv', # feature_importances_ needs to be computed
    ),
    cv=3,
)

model.fit(X, y)