Open Fish-Soup opened 7 years ago
You can actually do this right now using a combination of py-earth and scikit-learn. Just pass enable_pruning=False
to the Earth
constructor, then put the Earth
model in a Pipeline
with a ElasticNet
from scikit-learn. If you want to refit just the ElasticNet
part without refitting the Earth
part, though, then you'll have to do that manually. At least, I think so. You might look in the docs or write to the scikit-learn mailing list to make sure.
Ahh I don't think I quite understood how flexible pipelines where. I'll definitely take a look into this. Apart from the ability to warm start (after the MARS bit) can you see any advantages/disadvantages of either method. I assume using elastic net should be faster? (even with out a warm start).
Back to #157 do you have a feel for how much extra data i would have to add to change the output of the forward pass. I assume if i refitted with 1% more data it would have little effect? I suppose I'm asking if you know how it scales in general or would this be set by the problem at hand.
Many thanks
I don't know the advantages or disadvantages of using elasticnet vs the regular pruning method. You'll have to experiment. I would be curious to know the results.
Regarding stability of the forward pass, you can generally expect only slightly different predictions for small differences in data. However, the actual model could be substantially different. For example, if you have two highly correlated predictors, a small change in training input could affect which variable gets selected.
Hi so I've tried to put this into practice and its been a partial success. I have tried Earth=>Ridge, Earth=>Lasso and Earth => Elastic net pipelines. In the example case I made Ridge performs well, having similar accuracy to earth but taking half as long. LASSO performs less well, which makes sense as it will be removing some of the terms Earth created.
What I don't understand is that Elastic Net performs worse than both. There's a parameter in earth called l1_ratio which is supposed to define how like Lasso or Ridge it is. Setting l1_ratio to 1 does make earth act like lasso regression but setting it to 0 does not make it like Ridge. Which is odd.
I suppose something in elastic net is reacting poorly to the model MARS is sending it. Do you have any idea why this is? Are the hinge functions maybe exposed to Lasso and Elastic net? I know you are supposed to normalize the data but in my example all my xdata is normalized to start with.
Many thanks
My code is below
from sklearn.ensemble import AdaBoostRegressor from sklearn.metrics import r2_score from sklearn.cross_validation import train_test_split from sklearn.grid_search import GridSearchCV from pyearth import Earth import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.pipeline import Pipeline from sklearn.linear_model import ElasticNet, Lasso, Ridge from datetime import datetime
import math
def func(X): return 10. np.exp(.8np.abs(X['x0']))+25.np.sin(X['x0']math.pi)
np.random.seed(1) n = 100000 X = pd.DataFrame(np.random.normal(0.,1.,size=n),columns=['x0']) X['x1'] = np.random.uniform(0.,1.0,size=n) X['x2'] = np.random.normal(0.,1.0,size=n) X['x3'] = np.random.normal(0.,1.0,size=n) X['x4'] = np.random.normal(0.,1.0,size=n)
y = func(X)+ 50. *np.random.normal(0.,1.0,size=n)
plt.plot(X['x0'],y,"bo") plt.show()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=1)
dt=datetime.now() earth_net_0 = Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('enet',ElasticNet(l1_ratio=0.0,alpha=1.0))]) earth_net_0.fit(X_train,y_train) print "net= ",datetime.now()-dt
dt=datetime.now() earth_net_1 = Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('enet',ElasticNet(l1_ratio=1.0,alpha=1.0))]) earth_net_1.fit(X_train,y_train) print "net= ",datetime.now()-dt
dt=datetime.now() earth_net= Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('enet',ElasticNet())]) earth_net.fit(X_train,y_train) print "net= ",datetime.now()-dt
dt=datetime.now() earth_lasso = Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('lasso',Lasso())]) earth_lasso.fit(X_train,y_train) print "lasso= ",datetime.now()-dt
dt=datetime.now() earth_ridge = Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('ridge',Ridge())]) earth_ridge.fit(X_train,y_train) print "ridge= ",datetime.now()-dt
dt=datetime.now() earth_model = Earth(max_degree=4) earth_model.fit(X_train,y_train) print "earth= ",datetime.now()-dt
dt=datetime.now() earth_overfit =Earth(max_degree=4,enable_pruning=False) earth_overfit.fit(X_train,y_train) print "overfit earth= ",datetime.now()-dt
Y_test_predict_earth_net=earth_net.predict(X_test) Y_test_predict_earth_net_0=earth_net_0.predict(X_test) Y_test_predict_earth_net_1=earth_net_1.predict(X_test) Y_test_predict_earth_lasso=earth_lasso.predict(X_test) Y_test_predict_earth_ridge=earth_ridge.predict(X_test) Y_test_predict_earth=earth_model.predict(X_test) Y_test_predict_earth_overfit=earth_overfit.predict(X_test)
print "~~~"
print 'Earth Overfit score %f' % r2_score(y_test, Y_test_predict_earth_overfit) print 'Earth Score: %f' % r2_score(y_test, Y_test_predict_earth) print 'Earth Ridge score %f' % r2_score(y_test, Y_test_predict_earth_ridge) print 'Earth Lasso Score: %f' % r2_score(y_test, Y_test_predict_earth_lasso) print 'Earth Net Score: %f' % r2_score(y_test, Y_test_predict_earth_net)
plt.plot(X_test['x0'],func(X_test),"k.",label="true data")
plt.plot(X_test['x0'],Y_test_predict_earth,"go",label="earth")
plt.plot(X_test['x0'],Y_test_predict_earth_lasso,"cx",label="earth lasso") plt.plot(X_test['x0'],Y_test_predict_earth_ridge,"mx",label="earth ridge") plt.plot(X_test['x0'],Y_test_predict_earth_net,"b.",label="earth net") plt.plot(X_test['x0'],Y_test_predict_earth_net_0,"g.",label="earth net_0") plt.plot(X_test['x0'],Y_test_predict_earth_net_1,"r.",label="earth net_1")
plt.legend() plt.show()
@Fish-Soup I'm not super familiar with scikit-learn's elastic net, but if it's expecting normalized data then you might try putting a StandardScaler
in your pipeline between the Earth
and ElasticNet
steps. Normalized inputs to a MARS model will not generally result in normalized basis functions.
So I've just tried inserting StandardScaler into the pipelines but it doesn't seem to have helped. i think this is the right track though
earth_net= Pipeline([('earth',Earth(max_degree=4,enable_pruning=False)),('std_scal',StandardScaler()),('enet',ElasticNet())])
So as i understand it Lasso adds a regularization parameter that's the sum of the absolute values of each fitted parameter e.g |p1|+|p2| if we had a 2 parameter fit. Ridged does the sqrt(p1^2+p2^2). Lasso has no cross components and is good at choosing between two different variables e.g in the above case it would pick p1 or p2 but not both. Ridge lacks the capacity to ever remove any terms but (can take them close to 0). Its better when you have correlated variables, as it doesn't just pick one of the other but can contain a bit of both.
Elastic net is a combination of Ridge and Lasso where by the user can choose how much of one or the other to add. I was planning on using grid search to tune this. My confusion is i can tune this parameter so that it basically should make elastic net = ridge. But it doesn't..... Although i can make it the same to Lasso.
A consideration for all the above is that your xdata should be normalized so that mean =0 and std is constant for all data so that the scale of p1 and p2 is consistent (else certain terms will be chosen over the other just because the data has a larger range). Although all my xdata is normalized I am not sure if whatever earth exposes via the pipeline to ridge, elastic net or lasso is. Is it just fitted and before fitted data? or does it include hinges or something like that?
I find pipelines a bit hard to understand what is actually going on... many thanks
@Fish-Soup I'm not sure why the Elastic Net isn't behaving as expected. Perhaps the scikit-learn people can give you advice on that. In general, you can perform the steps of the Pipeline manually as follows (code untested):
model1 = Earth()
model1.fit(X, y)
model2 = StandardScaler()
model2.fit(model1.transform(X), y)
model3 = ElasticNet()
model3.fit(model2.transform(model1.transform(X)), y)
y_pred = model3.predict(model2.transform(model1.transform(X)))
That's really all it does. You can use this breakdown to debug your pipeline more easily.
Cool sorry to take so long to reply. It appears that one of the columns generated in model1.transform(X) is a set of one's. Which obviously can't be normalised. This doesn't seem to make much sense to me now I'm putting it down in writing. I'll check again in the morning.
@Fish-Soup One of the out columns from an Earth
model's transform method is a constant, representing the intercept term. If that's what's causing problems, you could add a stage to remove it, as it's unnecessary as an input to the ElasticNet
.
so It turns out my mistake was assuming that the alpha parameter in elastic net was the same as the one in ridge regression. because of the way it's structured.
fitting mars net is twice as fast as mars in the case we know the hyperparameters. doing a grid search over 100 by 10 for the 2 hyper parameters results in consistent fits between approaches.
in the case where not much regularization is required Mars is faster. in the case where there is lots of regularization mars net is faster.
as I am updating my models daily I plan to store my hyper parameters and periodicly test them. thus mars net will be faster. I'm also tempted to not refit the mars component of mars bet every time. I'm which case it will be factors of 100 faster.
After our discussion in #157. I decided to read up more on the advantages/disadvantages of using other regularization techniques. I found
CMARS 2012
CMARS 2008
,where the authors use Ridge (of Tikhonov) regularization within MARS after the initial forward pass to avoid over fitting. This seems an interesting approach and as far as i know would allow warm starts if the initial forward pass was skipped ( in reference to #157). I'm not sure how significant a piece of work with would be but it certainly seems like an interesting option. I suppose i would hope it could be extended to contain an Lasso + Ridge regression hybrid, like Elastic Net, which takes the advantages of both methods.
Thanks again for all the good work