Open nikrepp opened 6 years ago
@nikrepp I don't see any obvious problems with what you're doing. That seems like a pretty severe issue, though, so I'm surprised to be seeing it for the first time now. Here are a few questions that might help me:
pyearth.__version__
?Hello Jason,
see answers for your questions.
import pandas as pd import numpy as np
dataset = pd.read_csv('....csv', sep=',', encoding='cp1251') dataset = dataset.head(10000)
y = dataset[u'Флаг рефинансирования'] X = dataset.drop(dataset.columns[[0,1,2,3,6]], axis=1)
import pyearth import scipy import sklearn import numpy print(pyearth.version) print(numpy.version) print(scipy.version) print(sklearn.version)
import numpy from pyearth.earth import Earth from sklearn.linear_model import ElasticNet from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split
model = Pipeline([('earth',Earth(max_degree=4,max_terms=10, minspan_alpha=10, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X = StandardScaler().fit_transform(X) model.fit(X, y)
Stopping Condition 0: Reached maximum number of terms
C:\Users\I304909\AppData\Local\Continuum\Miniconda2\envs\tensorflow\lib\site-packages\sklearn\linear_model\coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting data with very small alpha may cause precision problems. ConvergenceWarning)
Out[4]:
Pipeline(memory=None, steps=[('earth', Earth(allow_linear=None, allow_missing=False, check_every=None, enable_pruning=False, endspan=None, endspan_alpha=None, fast_K=None, fast_h=None, feature_importance_type=None, max_degree=4, max_terms=10, min_search_points=None, minspan=None, minspan_alpha=10, penalty=None, ...alse, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False))])
import pandas as pd import numpy as np
dataset = pd.read_csv('C:/.../Census01.csv', sep=';', encoding='utf8') dataset = dataset
for i in dataset.columns: dataset[i] = dataset[i].factorize()[0].astype(np.int32)
y=dataset['age'] X = dataset.drop(dataset.columns[[0]], axis=1) model2 = Pipeline([('earth',Earth(max_degree=4,max_terms=10, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X = StandardScaler().fit_transform(dataset) model2.fit(X, y)
Stopping Condition 0: Reached maximum number of terms
Windows 10, python: I've tested 2.7 and 3 (the same behavior). PyEarth, Numpy, Scipy, Sklearn: 0.1.0 1.13.3 1.0.0 0.19.1
I tried different ways, last way through Conda, first - building from source (the same behavior).
Thanks! I also very interested what is that.
@nikrepp Thanks for all the info. In the code you pasted above, you set max_terms to 10, and the forward pass terminated after 5 iterations. That is expected behavior as each iteration produces 2 terms (assuming it finds a knot that is superior to the linear term). Is that the problem you are observing, or is there other worse behavior you're seeing? The reason it goes to iteration 9 on the UCI data set is that it is picking linear basis functions (knot = -1), which only add one term each.
Hello Jason,
fortunately, I can not reproduce weird behaviour anymore, so I prefer thinking it was corrupted install from sources under Python2 on Windows.
Thank you for all the details. I am looking forward for development of this framework for classification problems objectives, better support for categorical predictors and interpretation of fitted relationships.
Thanks!
P.S. You can give me a pleasure with a possibility to contribute in one of this topics.
Regards, Nikita
Hello, colleagues,
I have the following problem: using PyEarth for classification task on dataset with 300000 rows and more than 500 features, I set max_terms to sufficiently high number (i.e. 100). But after two iterations everything stopped and Stopping condition 0: Reached maximum number of terms appears.
import numpy from pyearth import Earth from sklearn.linear_model import ElasticNet from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error from sklearn.model_selection import train_test_split
model = Pipeline([('earth',Earth(max_degree=4,max_terms=100, verbose=True, enable_pruning=False)), ('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))])
X_t = StandardScaler().fit_transform(X_t) model.fit(X_t, Y_t*100)
Beginning forward pass
iter parent var knot mse terms gcv rsq grsq
0 - - - 34.148441 1 34.149 0.000 0.000
1 0 180 114453 34.135289 3 34.137 0.000 0.000
Stopping Condition 0: Reached maximum number of terms
May be I am just doing something wrong or whatever? From metrics I got I can see that model is pretty robust, but underfitted.
Nikita