Do not allow training of a Stacked Ensemble when there is only one model

exalate-issue-sync[bot] commented 1 year ago

There is a bug where in the trivial stacked ensemble (with one model), there are different metrics than the individual model. However, rather than fixing this, we should really just throw an error if someone tries to train an SE with a single model, since it makes no sense.

{code:r}library(h2o) h2o.init()

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") y <- "response" x <- setdiff(names(train), y) train[,y] <- as.factor(train[,y])

my_gbm <- h2o.gbm(x = x, y = y, training_frame = train, distribution = "bernoulli",ntrees = 10, nfolds = 5, keep_cross_validation_predictions = TRUE, seed = 1) ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm))

h2o.auc(ensemble)[1] 0.7973669 h2o.auc(my_gbm)[1] 0.7973651{code}

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] I’m not convinced we should throw an error in this case, I’ll rather chose to print a warning message:

throwing an error could break some existing user scripts.
the fact that we get different metrics when there is only one model is not really a bug, it is expected: SE is a different model than the single base model after all. It is just not a very useful usage.

wdyt?

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] +1

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7195 Assignee: Sebastien Poirier Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

h2oai / h2o-3

Do not allow training of a Stacked Ensemble when there is only one model #8437