Open exalate-issue-sync[bot] opened 1 year ago
Clem Wang commented: logloss is another function that fails:
{noformat}/usr/local/lib/python3.6/dist-packages/h2o/model/metrics_base.py in logloss(self) 176 def logloss(self): 177 """Log loss.""" --> 178 return self._metric_json["logloss"] 179 180
KeyError: 'logloss'{noformat}
This is too bad, because logloss is an option for GridSearch
{code:python}xgboost_grid2_perf = xgboost_grid2.get_grid(sort_by='logloss', # rmsedecreasing=False) # Lower is better for Logloss{code}
{noformat}{noformat}
JIRA Issue Migration Info
Jira Issue: PUBDEV-7333 Assignee: New H2O Bugs Reporter: Clem Wang State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Short version:
{code:python} /usr/local/lib/python3.6/dist-packages/h2o/model/metrics_base.py in auc(self) 191 def auc(self): 192 """The AUC for this set of metrics.""" --> 193 return self._metric_json['AUC'] KeyError: 'AUC' {code}
but if you get the keys of _metric_json, you can see that a lot of keys are missing for responding member functions
{code:python} ._metric_json.keys()
dict_keys(['__meta', 'model', 'model_checksum', 'frame', 'frame_checksum', 'description', 'model_category', 'scoring_time', 'predictions', 'MSE', 'RMSE', 'nobs', 'custom_metric_name', 'custom_metric_value', 'r2', 'hit_ratio_table', 'cm', 'logloss', 'mean_per_class_error']) {code}
These member functions fail fore the same reason aucpr, aic, gini, null_deviance. Might be others.
Either the data needs to be added back to the json or the broken member functions need to go away (mark deprecated.)
In case you want reproducible code:
{code:python} import h2o h2o.init() iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv") iris["class" ] = iris["class" ].asfactor() from h2o.estimators import H2OXGBoostEstimator
from h2o.grid.grid_search import H2OGridSearch
xgboost_params = { "ntrees" : [3,4] # ,5,6,7,8] , "max_depth" : [3,4] # , 5,6,7,8,9,10,11] , "learn_rate" : [0.01, 0.02] # , 0.04] , "sample_rate" : [0.5, 0.6, 0.7] #, 0.8] , "col_sample_rate_per_tree" : [0.5, 0.6] #, 0.7, 0.8, 0.9] , "min_rows" : [3,4] #,6,7] , "seed": [42] }
search_criteria = {'strategy': 'RandomDiscrete', 'max_models': 20, 'seed': 1}
xgboost_grid1 = H2OGridSearch(model= H2OXGBoostEstimator,
grid_id='xgboost_grid_cartesian', hyper_params=xgboost_params)
xgboost_grid1.train(x=x, y='class', training_frame=iris, validation_frame=iris, seed=42)
Get the grid results, sorted by validation AUC
xgboost_grid1_perf = xgboost_grid1.get_grid(sort_by='logloss', decreasing=True) # would be nice to use 'auc' print(xgboost_grid1_perf)
Grab the top XGB model, chosen by validation AUC
best_xgb1 = xgboost_grid1_perf.models[0]
Now let's evaluate the model performance on a test set
so we get an honest estimate of top model performance
best_xgb1_perf1 = best_xgb1.model_performance(iris)
print(best_xgb1_perf1._metric_json) best_xgb1_perf1.auc()
{code}