Add StackedEnsembles To AutoML's Time Budget

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Apache License 2.0

6.89k stars 2k forks source link

Currently, StackedEnsembles ignore the time constraint from AutoML which is fine for a lot of situations since the metalearner is usually very fast to train.

However, in some situations SEs take much more time than expected. This behavior was observed mainly on multinomial tasks. One good example is 1h training on [Covertype dataset|https://www.openml.org/d/1596] which very often takes 1h30mins and the main contributing factor is the StackedEnsemble.

The goal of this JIRA is to speed up automl training in cases where it uses more time than it should, this will possibly influence the performance of the SE models in these cases so another goal is to have SE in the top 5 models in 95+% of the datasets we use for benchmark.

h2oai / h2o-3

Add StackedEnsembles To AutoML's Time Budget #7420