h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

runit_NOPASS_pub_960_glm_aic_R : my reading of jessica's GLM doc implies family = 'tweedie' should work as before. It apparently doesn't? #13732

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

http://mr-0xb1:8080/view/All/job/h2o_runit_nopass_serial/96/testReport/runit_NOPASS_pub_960_glm_aic/R/runit_NOPASS_pub_960_glm_aic_R/

[2015-04-05 17:05:39] [ERROR] : Error: Test failed: 'Testing AIC value for GLM families gamma and tweedie' Not expected: "family" must be in "gaussian", "binomial", "poisson", "gamma", but got tweedie return() }) 9: tryCatchList(expr, classes, parentenv, handlers) 10: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 11: doTryCatch(return(expr), name, parentenv, handler) 12: paste0(.h2o.__JOBS, "/", job_key) 13: .h2o.startModelJob(conn, algo, params, envir) 14: stop(error) 15: .handleSimpleError(function (e) { e$calls <- head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, "\"family\" must be in \"gaussian\", \"binomial\", \"poisson\", \"gamma\", but got tweedie", quote(.h2o.startModelJob(conn, algo, params, envir))).

SEED used: 2112267136

[2015-04-05 17:05:39] [ERROR] : TEST FAILED No traceback available

from jessica's doc: GLM

The following parameters have been renamed, but retain the same functions:

H2O Parameter Name H2O-Dev Parameter Name data training_frame key destination_key prior prior1 nfolds n_folds nlambda nlambdas lambda.min.ratio lambda_min_ratio iter.max max_iter epsilon beta_eps beta_constraints beta_constraint

The following parameters have been removed:

return_all_lambda: A logical value indicating whether to return every model built during the lambda search. higher_accuracy: A logical value indicating whether to use line search. strong_rules: Discards predictors likely to have 0 coefficients prior to model building. intercept: Defines factor columns in the model. non_negative: Specify a non-negative response. variable_importances: Variable importances are now computed automatically and displayed in the model output. They have been renamed to Normalized Coefficient Magnitudes. disable_line_search: Disables line search for faster model building. offset: Specify a column as an offset. max_predictors: Stops training the algorithm if the number of predictors exceeds the specified value.

The following parameters have been added:

class_sampling_factors:

validation_frame: Specify the validation dataset. balance_classes: For imbalanced data, balance training data class counts via over/under-sampling for improved predictive accuracy. max_after_balance_size: If classes are balanced, limit the resulting dataset size to the specified multiple of the original dataset size. solver: Select ADMM or LBFGS.

N-fold cross-validation and grid search will be supported in a future version of H2O-Dev.

H2O H2O-Dev h2o.glm <- function( h2o.glm <- function( x, x, y, y, data, training_frame, key = "", destination_key, family, family = c("gaussian", "binomial", "poisson", "gamma", "tweedie"), link, link = c("family_default", "identity", "logit", "log", "inverse", "tweedie"), tweedie.p = ifelse(family == "tweedie", tweedie_variance_power = NaN, 1.5, NAreal), tweedie_link_power = NaN, prior = NULL prior1 = 0.0, nfolds = 0, n_folds = 0, alpha = 0.5, alpha = 0.5, lambda = 1e-5, lambda = 1e-05, lambda_search = FALSE, lambda_search = FALSE, nlambda = -1, nlambdas = -1, lambda.min.ratio = -1, lambda_min_ratio = 1.0, standardize = TRUE, standardize = TRUE, iter.max = 100, max_iter = 50, epsilon = 1e-4 beta_eps = 0 use_all_factor_levels = FALSE use_all_factor_levels = FALSE, beta_constraints = NULL, beta_constraint = NULL, return_all_lambda = FALSE, class_sampling_factors, higher_accuracy = FALSE, validation_frame, strong_rules = TRUE, balance_classes = FALSE, intercept = TRUE, max_after_balance_size = 5.0, non_negative = FALSE, solver = c("ADMM", "L_BFGS"), variable_importances = FALSE, ... disable_line_search = FALSE, ) offset = NULL,
max_predictors = -1)

exalate-issue-sync[bot] commented 1 year ago

Neeraja Madabhushi commented: Tweedie tests should be NOFEATURE tests.

Jessica is going to update documentation

exalate-issue-sync[bot] commented 1 year ago

Tomas Nykodym commented: we do not produce AIC metric for tweedy and there is currently no plan to do so, R's glm does not produce AIC for tweedy either

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-742 Assignee: Tomas Nykodym Reporter: Neeraja Madabhushi State: Closed Fix Version: N/A Attachments: N/A Development PRs: N/A