Open jcrudy opened 5 years ago
The issue is that Earth
's coef_
is in the shape of pruned basis, whereas RFECV
expects the classifier to have coef_
the shape of num_features
.
RFECV
will also inspect feature_importances_
if it's available. So using Earth
with any of sklearn's featureselection APIs could work if `coefwas renamed to something like
basiscoefalong with computing
featureimportances`
For example, this hacky proof of concept could work for an example similar to https://github.com/scikit-learn-contrib/py-earth/issues/188#issuecomment-442088727
from pyearth import Earth
from sklearn.datasets.samples_generator import make_regression
from sklearn.feature_selection import RFECV
class EarthWrapper(object):
def __init__(self, **kwargs):
self.earth = Earth(**kwargs)
def __getstate__(self):
return self.__dict__
def __setstate__(self, d):
self.__dict__ = d
def __getattr__(self, name):
if name == 'coef_':
raise AttributeError # hide from RFECV so that it uses feature_importances_
elif name == 'earth':
return self.earth
elif hasattr(self.earth, name):
return getattr(self.earth, name)
raise AttributeError
X, y = make_regression(n_features=2)
model = RFECV(
estimator=EarthWrapper(
feature_importance_type='gcv', # feature_importances_ needs to be computed
),
cv=3,
)
model.fit(X, y)
The
Earth
model seems to be unusable as the estimator inRFECV
. See https://github.com/scikit-learn-contrib/py-earth/issues/188#issuecomment-442088727.