h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.89k stars 1.99k forks source link

Cross-validated GLM lambda search stops early with early_stopping set to false #11736

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Main model for cross-validated glm lambda search is computed only for lambdas up to the best lambda regardless of early stopping option. It should compute all models if early stopping option is set to false (The models for each fold respect the early stopping setting and compute models for all lambdas)

From SO: https://stackoverflow.com/questions/45890985/h2o-glm-lambda-search-not-appearing-to-iterate-over-all-lambdas

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: Hey [~accountid:557058:1f01b471-f37b-40af-bae9-a18b38e24549] I found this older GLM ticket. It’s not urgent, but when you want a break from something else, could you check if this is still an issue?

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Erin:

There are two early-stopping mechanism in GLM. One is the normal early stopping with all the other algos. The other one is for GLM when the objective or the coefficients are not changing much, it will also early stop. I disabled the second early stop if the first early-stop is turned on.

{noformat}public boolean converged(){ boolean converged = false; if(_betaDiff < _parms._beta_epsilon) { convergenceMsg = "betaDiff < eps; betaDiff = " + _betaDiff + ", eps = " + _parms._beta_epsilon; converged = true; } else if(_relImprovement < _parms._objective_epsilon) { convergenceMsg = "relImprovement < eps; relImprovement = " + _relImprovement + ", eps = " + _parms._objective_epsilon; converged = true; } else convergenceMsg = "not converged, betaDiff = " + _betaDiff + ", relImprovement = " + _relImprovement; return converged; }{noformat}

If you want to stop the GLM specific early stop, you can set _parms.__objective_epsilon and _parms._beta_epsilon to be very small.

Also, if you are running regression with no regularization, it is a one step algo and there is no iteration and no early stop.

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4858 Assignee: Wendy Reporter: Tomas Nykodym State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A