h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 1.99k forks source link

Extremely slow GLM metalearner when training simple Stacked Ensemble models on specific dataset #7730

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Training a basic AutoML using dataset: [https://www.openml.org/d/42734|https://www.openml.org/d/42734]

{code:python}import h2o from h2o.automl import H2OAutoML

h2o.init() train = h2o.import_file("https://www.openml.org/data/download/22044770/dataset") aml = H2OAutoML(max_runtime_seconds=360) aml.train(training_frame=train, y='job'){code}

the training lasts forever on the first SE, when running benchmark (using max_runtime_seconds=3600), the app automatically kills the process after 2h, that is >1h after the SE started.

Only noticed a lot (hundreds) of "Ls failed" from the GLM metalearner in the logs:

{noformat}11-23 16:23:41.144 127.0.0.1:54321 14800 FJ-4-11 INFO water.default: GLM[dest=metalearner_AUTO_StackedEnsemble_BestOfFamily_AutoML_20201123_145215_cv_2, iter=0 lmb=.14E-3 alpha=.5E0 obj=1.1523 imp=.1E1 bdf=.11E2] Class 49 got 7 active columns out of 255 total 11-23 16:23:41.144 127.0.0.1:54321 14800 FJ-4-11 INFO water.default: GLM[dest=metalearner_AUTO_StackedEnsemble_BestOfFamily_AutoML_20201123_145215_cv_2, iter=0 lmb=.14E-3 alpha=.5E0 obj=1.1523 imp=.1E1 bdf=.11E2] Class 50 got 15 active columns out of 255 total 11-23 16:23:51.038 127.0.0.1:54321 14800 FJ-4-11 INFO water.default: GLM[dest=metalearner_AUTO_StackedEnsemble_BestOfFamily_AutoML_20201123_145215_cv_2, iter=0 lmb=.14E-3 alpha=.5E0 obj=1.1523 imp=.1E1 bdf=.11E2] Ls failed 11-23 16:23:52.748 127.0.0.1:54321 14800 FJ-4-11 INFO water.default: GLM[dest=metalearner_AUTO_StackedEnsemble_BestOfFamily_AutoML_20201123_145215_cv_2, iter=0 lmb=.14E-3 alpha=.5E0 obj=1.1523 imp=.1E1 bdf=.11E2] Ls failed 11-23 16:23:54.481 127.0.0.1:54321 14800 FJ-4-11 INFO water.default: GLM[dest=metalearner_AUTO_StackedEnsemble_BestOfFamily_AutoML_20201123_145215_cv_2, iter=0 lmb=.14E-3 alpha=.5E0 obj=1.1523 imp=.1E1 bdf=.11E2] Ls failed{noformat}

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: GLM adds line search after finding a coefficient update direction to determine how much update to use. If it fails, it usually will just quit out of it. I need to spend more time on it.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7915 Assignee: Wendy Wong Reporter: Sebastien Poirier State: Open Fix Version: Backlog Attachments: N/A Development PRs: N/A