h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

H2O-3 Backup models in case user shut down cluster #6587

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Request come from HAIC team. Use case: H2O-3 clusters cannot be restarted, so if they shutdown and a user does not manually save their work, their data and models are gone forever. 

Solution could be to have export_checkpoints_dir always set to some global backup directory.

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8850 Assignee: New H2O Bugs Reporter: Adam Valenta State: Open Fix Version: 3.42.0.1 Attachments: N/A Development PRs: N/A