scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 121 forks source link

Usability: Earth.fit exception when fitted multiple times on different X num_features #198

Open hdinh opened 5 years ago

hdinh commented 5 years ago

Getting ValueError: Wrong number of columns in X. Reshape your data. exception if Earth is fitted multiple times if the number of features of X changes. From a usability standpoint, this seems weird because this doesn't seem to be consistent with other sklearn estimators.

As a user, I would expect the classifier to ignore the past basis_ value and refit at each fit call.

from sklearn.base import RegressorMixin
from sklearn.datasets import make_friedman1
from sklearn.utils.testing import all_estimators
from pyearth import Earth

def check_classifier(clf):
    X, y = make_friedman1(100, 6)
    clf.fit(X, y)
    X2, y2 = make_friedman1(100, 5)
    clf.fit(X2, y2)

classifiers = [(klass_name, klass) for klass_name, klass in all_estimators() if issubclass(klass, RegressorMixin)]
for klass_name, klass in classifiers:
    if 'MultiTask' in klass_name: continue
    check_classifier(klass())

check_classifier(Earth()) # only one that throws, ValueError: Wrong number of columns in X. Reshape your data.
jcrudy commented 4 years ago

@hdinh Thanks for reporting this. This should be an easy enough fix, but I won't have time to make it for a while. If you're interested, I would welcome a pull request and would have time to review and offer feedback. Also, I'm curious what the use case is for fitting the same model on new data. I'm probably missing something here, but wouldn't it make more sense to just create a new model?