H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
On H2O 3.30.0.1 if a binary model is saved with a validation dataset the actual value for the validation dataset key is not stored (this is also the case for the training dataset). The expectation, though, is that you could get the validation frame key from your binary model even if you are starting a fresh cluster.
To Reproduce:
Build and save a binary model
{code}
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
h2o.init()
import the covtype dataset:
this dataset is used to classify the correct forest cover type
On H2O 3.30.0.1 if a binary model is saved with a validation dataset the actual value for the validation dataset key is not stored (this is also the case for the training dataset). The expectation, though, is that you could get the validation frame key from your binary model even if you are starting a fresh cluster.
To Reproduce:
import the covtype dataset:
this dataset is used to classify the correct forest cover type
original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Covertype
covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data")
convert response column to a factor
covtype[54] = covtype[54].asfactor()
set the predictor names and the response column name
predictors = covtype.columns[0:54] response = 'C55'
split into train and validation sets
train, valid = covtype.split_frame(ratios = [.8], seed = 1234)
try using the balance_classes parameter (set to True):
model = H2OGradientBoostingEstimator(balance_classes = True, seed = 1234) model.train(x = predictors, y = response, training_frame = train, validation_frame = valid)
binary_path = h2o.save_model(model=model) print(binary_path) {code}
either of these functions will return None for the validation frame (and likewise the training frame)
model.actual_params["validation_frame"] model._model_json["parameters"][2]['actual_value'] {code}