We should change RF so that when you ask for training metrics, you get actual training metrics instead of OOB metrics. Right now, if you extract “train” metrics, they are OOB metrics and there’s no way to get the train metrics other than to manually recreate them using h2o.performance() on the training set. This is confusing to users and not consistent with our definition of “training metrics” and possibly having adverse affects on our ability to do early stopping.

Note: when making the change make sure that checkpointing still work as expected.

More details: Currently, if you look up the evaluation metrics for Random Forest:

(using: 3.20.0.3 - DRF + iris - default settings)

OOB evaluation metrics will be used when you ask for the training metrics (it doesn't seem like passing in a validation frame or specifying nfolds changes this):

In Flow it will state OUTPUT - Training_Metrics

In R (same for python) if you do h2o.mse(fit, train = TRUE) it will return the OOB metrics instead of the train metrics. To get the actual train metrics you would need to do h2o.mse(h2o.performance(fit, newdata = as.h2o(iris)))

Code Snippet for testing:

{code:r} library(h2o) h2o.init()

When only the training frame is supplied:

h2o.mse(fit, train = TRUE) != h2o.mse(h2o.performance(fit, newdata = as.h2o(iris)))

fit <- h2o.randomForest(x = 1:4, y = 5, training_frame = as.h2o(iris), seed = 1234) h2o.mse(fit, train = TRUE) h2o.mse(h2o.performance(fit, newdata = as.h2o(iris)))

doesn't seem that providing nfolds changes anything

fit2 <- h2o.randomForest(x = 1:4, y = 5, training_frame = as.h2o(iris), nfolds = 3, seed = 1234) h2o.mse(fit2, train = TRUE) h2o.mse(fit2, xval = TRUE) h2o.mse(h2o.performance(fit2, newdata = as.h2o(iris)))

split the data to see if passing a validation fram changes anything

iris.split <- h2o.splitFrame(as.h2o(iris), ratios = c(0.2, 0.5)) train1 <- iris.split[[1]] valid1 <- iris.split[[2]]

check when only a training frame is supplied

fit2a <- h2o.randomForest(x = 1:4, y = 5, training_frame = train1, seed = 1234) h2o.mse(fit2a, train = TRUE) h2o.mse(h2o.performance(fit2a, newdata = train1))

when supplying a validation frame should the `h2o.mse(fit3, train = TRUE)` be

equal to `h2o.mse(h2o.performance(fit3, newdata = train1))`?

fit3 <- h2o.randomForest(x = 1:4, y = 5, training_frame = train1, validation_frame = valid1, seed = 1234) h2o.mse(fit3, train = TRUE) h2o.mse(fit3, valid = TRUE) h2o.mse(h2o.performance(fit3, newdata = train1)) {code}

h2oai / h2o-3

Return Actual Train Metrics for Random Forest #12649

When only the training frame is supplied:

h2o.mse(fit, train = TRUE) != h2o.mse(h2o.performance(fit, newdata = as.h2o(iris)))

doesn't seem that providing nfolds changes anything

split the data to see if passing a validation fram changes anything

check when only a training frame is supplied

when supplying a validation frame should the `h2o.mse(fit3, train = TRUE)` be

equal to `h2o.mse(h2o.performance(fit3, newdata = train1))`?

h2oai / h2o-3

Return Actual Train Metrics for Random Forest #12649

When only the training frame is supplied:

h2o.mse(fit, train = TRUE) != h2o.mse(h2o.performance(fit, newdata = as.h2o(iris)))

doesn't seem that providing nfolds changes anything

split the data to see if passing a validation fram changes anything

check when only a training frame is supplied

when supplying a validation frame should the h2o.mse(fit3, train = TRUE) be

equal to h2o.mse(h2o.performance(fit3, newdata = train1))?

when supplying a validation frame should the `h2o.mse(fit3, train = TRUE)` be

equal to `h2o.mse(h2o.performance(fit3, newdata = train1))`?