H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Currently, StackedEnsembles ignore the time constraint from AutoML which is fine for a lot of situations since the metalearner is usually very fast to train.
However, in some situations SEs take much more time than expected. This behavior was observed mainly on multinomial tasks. One good example is 1h training on [Covertype dataset|https://www.openml.org/d/1596] which very often takes 1h30mins and the main contributing factor is the StackedEnsemble.
The goal of this JIRA is to speed up automl training in cases where it uses more time than it should, this will possibly influence the performance of the SE models in these cases so another goal is to have SE in the top 5 models in 95+% of the datasets we use for benchmark.
Currently, StackedEnsembles ignore the time constraint from AutoML which is fine for a lot of situations since the metalearner is usually very fast to train.
However, in some situations SEs take much more time than expected. This behavior was observed mainly on multinomial tasks. One good example is 1h training on [Covertype dataset|https://www.openml.org/d/1596] which very often takes 1h30mins and the main contributing factor is the StackedEnsemble.
The goal of this JIRA is to speed up automl training in cases where it uses more time than it should, this will possibly influence the performance of the SE models in these cases so another goal is to have SE in the top 5 models in 95+% of the datasets we use for benchmark.