rasbt / mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.
4.86k stars 857 forks source link

Does stacking regressor work with sklearn GridSearchCV() with hyper parameter search? #760

Closed jpandeinge closed 3 years ago

jpandeinge commented 3 years ago

I had a query about whether stacking regressor supports a sklearn GrindSearchCV() where I use an algorithm for hyperparameter tuning for optimization. Sample code:

seed = 1
from mlxtend.regressor import StackingCVRegressor
from sklearn.svm import SVR
from lightgbm import LGBMRegressor
from sklearn.linear_model import LassoCV,LinearRegression
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from catboost import CatBoostRegressor
from sklearn.model_selection import KFold```

# linear Regression
lin_reg = LinearRegression(normalize =True, fit_intercept =False)

# CatBoot Regressor
cat_boost = CatBoostRegressor(random_seed=seed, depth=4)

# Epsilon-Support Vector Regression.
svr = SVR(C = 1,kernel='poly', degree = 5) 

# Lasso linear model with iterative fitting along a regularization path.
lasso = LassoCV(
  alphas=[0.0001, 0.0003, 0.0006, 0.001, 0.003, 0.006, 0.01, 0.03, 0.06, 0.1,0.3, 0.6, 1]
  ,tol = 1e-2
  fit_intercept = True
  ,cv= KFold(n_splits= 5, shuffle =True,random_state=seed)
  normalize = True,
  n_jobs = -1

# decision tree regressor.
dt_regressor = DecisionTreeRegressor(max_depth=4,random_state = seed)

# random forest regressor.
rf = RandomForestRegressor(random_state=seed, n_estimators = 100, verbose=seed)

# xgb regressor 
xgb_regressor = XGBRegressor(n_estimators = 500,colsample_bytree=1,
                             objective='reg:squarederror', eval_metric ='rmse',
                             random_state= seed, verbose=seed)
# LightGBM regressor.
lgbm_regressor = LGBMRegressor(objective ='regression',
                               boosting_type='rf', bagging_fraction=0.8, bagging_freq = 1,
                               n_leaves =31, n_estimators= 500, learning_rate =0.015, random_state=seed, metric='rmse', verbose=seed)

#  AdaBoost regressor.
ada_boost = AdaBoostRegressor(dt_regressor, random_state = seed, n_estimators = 100)

forecaster = StackingCVRegressor(regressors=(lin_reg, lasso, svr, lgbm_regressor, xgb_regressor, cat_boost, ada_boost,),
                            meta_regressor= lin_reg,
                            shuffle = True,
                            cv = 10,
from sklearn.model_selection import GridSearchCV

params = {'estimator__linearregression__fit_intercept': ['True', 'False'],
          'estimator__linearregression__normalize' : ['True', 'False']}

grid = GridSearchCV(

grid.fit(x_train, y_train)

print("Best: %f using %s" % (grid.best_score_, grid.best_params_))

I believe the error is from the params variable that I defined, I just don't seem to get it since I tried to implement all the parameters for all the regressors that I got using the grid.get_params().keys().

However, the above code leads to an error below;

ValueError: Invalid parameter estimator for estimator StackingCVRegressor(cv=10,
                                LassoCV(alphas=[0.0001, 0.0003, 0.0006, 0.001,
                                                0.003, 0.006, 0.01, 0.03, 0.06,
                                                0.1, 0.3, 0.6, 1],
                                        cv=KFold(n_splits=5, random_state=1, shuffle=True),
                                        n_jobs=-1, normalize=True,
                                        random_state=1, tol=0.01,
                                             random_state=1, reg_alpha=None,
                                             subsample=None, tree_method=None,
                                             verbose=1, verbosity=None),
                                <catboost.core.CatBoostRegressor object at 0x7f1bf9231a10>,
                    use_features_in_secondary=True). Check the list of available parameters with `estimator.get_params().keys()`.

Is there a way to tune all the parameters for all the regressors used in order to have the best optimal ones and try it on a new model? Is there a way to retain them since I would like to do a hyper parameter search?

jpandeinge commented 3 years ago

I solved it, I defined the parameters in the params variable wrongly. And changed 'params` from

params = {'estimator__linearregression__fit_intercept': ['True', 'False'],
          'estimator__linearregression__normalize' : ['True', 'False']}

to (below) by removing estimator__ in front of every base models used.

params = {'linearregression__fit_intercept': ['True', 'False'],
          'linearregression__normalize' : ['True', 'False']}