h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.89k stars 2k forks source link

Add StackedEnsembles To AutoML's Time Budget #7420

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Currently, StackedEnsembles ignore the time constraint from AutoML which is fine for a lot of situations since the metalearner is usually very fast to train.

However, in some situations SEs take much more time than expected. This behavior was observed mainly on multinomial tasks. One good example is 1h training on [Covertype dataset|https://www.openml.org/d/1596] which very often takes 1h30mins and the main contributing factor is the StackedEnsemble.

The goal of this JIRA is to speed up automl training in cases where it uses more time than it should, this will possibly influence the performance of the SE models in these cases so another goal is to have SE in the top 5 models in 95+% of the datasets we use for benchmark.

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8233 Assignee: Tomas Fryda Reporter: Tomas Fryda State: Resolved Fix Version: 3.34.0.1 Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/5569