h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Binary Model Doesn't Show Train and Valid Frame Keys #7707

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

On H2O 3.30.0.1 if a binary model is saved with a validation dataset the actual value for the validation dataset key is not stored (this is also the case for the training dataset). The expectation, though, is that you could get the validation frame key from your binary model even if you are starting a fresh cluster.

To Reproduce:

  1. Build and save a binary model {code} import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator h2o.init()

import the covtype dataset:

this dataset is used to classify the correct forest cover type

original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Covertype

covtype = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data")

convert response column to a factor

covtype[54] = covtype[54].asfactor()

set the predictor names and the response column name

predictors = covtype.columns[0:54] response = 'C55'

split into train and validation sets

train, valid = covtype.split_frame(ratios = [.8], seed = 1234)

try using the balance_classes parameter (set to True):

model = H2OGradientBoostingEstimator(balance_classes = True, seed = 1234) model.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

binary_path = h2o.save_model(model=model) print(binary_path) {code}

  1. shutdown cluster, start a new cluster, and load binary model {code} h2o.cluster().shutdown() h2o.init() model = h2o.load_model(path=binary_path)

either of these functions will return None for the validation frame (and likewise the training frame)

model.actual_params["validation_frame"] model._model_json["parameters"][2]['actual_value'] {code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7938 Assignee: Adam Valenta Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A