Open exalate-issue-sync[bot] opened 1 year ago
Erin LeDell commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] Would love to get this fixed in 3.34.0.1! I got another question about this on Stack Overflow the other day and had to explain that we have two “versions” of CV AUC (the correct one and the not correct one): [https://stackoverflow.com/questions/64032018/retrieve-cross-validation-performance-auc-on-h2o-automl-for-holdout-dataset/64057390#64057390|https://stackoverflow.com/questions/64032018/retrieve-cross-validation-performance-auc-on-h2o-automl-for-holdout-dataset/64057390#64057390]
JIRA Issue Migration Info
Jira Issue: PUBDEV-4975 Assignee: Michal Kurka Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Currently, H2O stores two different versions of the cross-validation metrics. There is one version stored in the "Cross-validation metrics table" (which are the true CV metrics, averaged across folds) and a second version which is the metric calculated once across all the cv predictions. The second version is what you get when you use the
h2o.performance()
function or the individual functions likeh2o.auc()
, so most users will be using “incorrect” one. It is bad to have two different point estimates of a single parameter stored in a model.The reason for computing the metric once across the aggregated cv preds was motivated by the need to plot the ROC curve easily as one curve (instead of several ROC curves, one per fold), however that is not a good enough reason to report two sets of metrics, where the one that people use the most (via the h2o.performance() function is technically & statistically “incorrect”).
For plotting the ROC curve for xval models, we can still use the aggregated metrics, but the values that are returned by the h2o.performance() function should pull from the true, mean values of those metrics across folds (it should print the correct ones too).
Code example:
{code:java}library(h2o) h2o.init()
train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") y <- "response" x <- setdiff(names(train), y)
For binary classification, response should be a factor
train[,y] <- as.factor(train[,y])
fit <- h2o.gbm(x = x, y = y, training_frame = train, nfolds = 3) fit@model$cross_validation_metrics_summary #one set of metrics (correct)
Comparisons
h2o.auc(h2o.performance(fit, xval = TRUE))
0.7763038 vs 0.776584 from the xval table
h2o.mse(h2o.performance(fit, xval = TRUE))
0.1921713 vs 0.19217622 from the xval table
h2o.rmse(h2o.performance(fit, xval = TRUE))
0.4383734 vs 0.43837336 from the xval table{code}