h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.88k stars 1.99k forks source link

Add checkpointing to AutoML #9002

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We have a feature request from the AutoML community -- some other tools (TPOT and auto-sklearn is working to add this right now) have the ability to do checkpointing. We could add this functionality to H2O AutoML since all of our algorithms support checkpointing in some way.

exalate-issue-sync[bot] commented 1 year ago

Michael commented: Two related things would help with this feature:

Saving automl model run. Currently, you can add additional models to an automl run by calling automl again with the same project name. But if you want to stop and start this repeatedly, you really need to be able to easily save and re-load the automl run first.

Ability to use a different training data frame on repeated runs. At least some of the H2O algorithms can be checkpointed and then trained with different training data (so long as it has the same columns and column types). One use case would be where you get additional data once a week, and so either checkpoint the original automl models or build additional models with the combination of old+new data to see if the leaderboard changes given the additional data.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6630 Assignee: UNASSIGNED Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A