Open exalate-issue-sync[bot] opened 1 year ago
Michal Kurka commented: Paused work for now (will get back to it after the fix release)
Sebastien Poirier commented: Should we put this back in the pipe?
JIRA Issue Migration Info
Jira Issue: PUBDEV-5281 Assignee: Michal Kurka Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Here is the initial design idea for the Stacked Ensemble grid search (which is mostly a search of the metalearner hyperparameters, but could also include other params like metalearner_nfolds).
Python example: {code} metalearner_grid_params_gbm = {'max_depth': [2,3,4], 'col_sample_rate': [0.2,0.5,0.7]} metalearner_grid_params_rf = {'ntrees': [200,300,400], 'col_sample_rate': [0.2,0.5,0.7]}
set up SE grid, use hyper_params to pass a new value called metalearner_params
grid = H2OGridSearch(model=H2OStackedEnsembleEstimator, hyper_params={'metalearner_grid_params': [{'algorithm': "GBM", 'params': metalearner_grid_params_gbm}, {'algorithm': "DRF", 'params': metalearner_grid_params_rf}]}, seed=1, search_criteria={'strategy': 'RandomDiscrete', 'max_models': 36})
grid.train(x=x, y=y, training_frame=train, seed=1, #this is SE seed (not grid seed) base_models=[my_gbm, my_rf]) #pass along fixed SE params like base_models
Single model (for comparison)
metalearner_gbm_params = {'max_depth': 2, 'col_sample_rate': 0.3} ensemble = H2OStackedEnsembleEstimator(base_models=[my_gbm, my_rf], metalearner_algorithm="GBM", metalearner_params=metalearner_gbm_params) ensemble.train(x=x, y=y, training_frame=train) {code}