h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Do not allow training of a Stacked Ensemble when there is only one model #8437

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

There is a bug where in the trivial stacked ensemble (with one model), there are different metrics than the individual model. However, rather than fixing this, we should really just throw an error if someone tries to train an SE with a single model, since it makes no sense.

{code:r}library(h2o) h2o.init()

train <- h2o.importFile("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") y <- "response" x <- setdiff(names(train), y) train[,y] <- as.factor(train[,y])

my_gbm <- h2o.gbm(x = x, y = y, training_frame = train, distribution = "bernoulli",ntrees = 10, nfolds = 5, keep_cross_validation_predictions = TRUE, seed = 1) ensemble <- h2o.stackedEnsemble(x = x, y = y, training_frame = train, base_models = list(my_gbm))

h2o.auc(ensemble)[1] 0.7973669 h2o.auc(my_gbm)[1] 0.7973651{code}

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: [~accountid:557058:afd6e9a4-1891-4845-98ea-b5d34a2bc42c] I’m not convinced we should throw an error in this case, I’ll rather chose to print a warning message:

wdyt?

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: [~accountid:5b153fb1b0d76456f36daced] +1

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7195 Assignee: Sebastien Poirier Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A