h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Add better error message to h2o.ensemble for misspecified custom learner wrappers #9969

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

When a user tries to make a custom wrapper by wrapping h2o.glm instead of h2o.glm.wrapper, for example, the error message is not informative. We should do a check to make sure a wrapper function is correct and stop the user if it's not. Example below.

The problem is that there is a keep_cross_validation_folds argument that's trying to be passed to a learner wrapper and it no longer passes through silently (this is a typo of the real argument name).

Adding this custom learner wrapper triggers the bug: {code} h2o.glm.lbfgs <- function(..., solver = "L_BFGS") { h2o.glm(..., solver = solver)} {code}

{code} Error in h2o.glm(..., solver = solver) : unused argument (keep_cross_validation_folds = TRUE) 9 h2o.glm(..., solver = solver) 8 match.fun(learner[l])(y = y, x = x, training_frame = training_frame, validation_frame = NULL, family = family, fold_column = fold_column, keep_cross_validation_folds = TRUE) 7 .fitFun(l, y, xcols, training_frame, validation_frame, family, learner, seed, fold_column) 6 system.time(fit <- .fitFun(l, y, xcols, training_frame, validation_frame, family, learner, seed, fold_column), gcFirst = FALSE) 5 FUN(X[[i]], ...) 4 lapply(X = X, FUN = FUN, ...) 3 sapply(X = 1:L, FUN = .fitWrapper, y = y, xcols = x, training_frame = training_frame, validation_frame = NULL, family = family, learner = learner, seed = seed, fold_column = "fold_id", simplify = FALSE) 2 .make_Z(x = x, y = y, training_frame = training_frame, family = family, learner = learner, parallel = parallel, seed = seed, V = V, L = L, idxs = idxs, metalearner_type = metalearner_type) 1 h2o.ensemble(x = x, y = y, training_frame = train, family = family, learner = learner, metalearner = metalearner, cvControl = list(V = 5, shuffle = TRUE)) Timing stopped at: 0.003 0 0.002 {code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3046 Assignee: Erin LeDell Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A