h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

h2o.grid with GAM formatting issue and documentation #6563

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

As can be seen in the grid search below, the formatting of {{gam_columns}} in the grid search output is bugged.

Additionally, the [documentation|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/gam.html] shows how to specify hyperparameters with Python syntax, but doesn’t show the corresponding R syntax, I think that’d be a nice addition. It maybe should also be mentioned that argument is applicable to {{h2o.grid}}, not {{h2o.gam}}. Also, it wasn’t clear how to use the {{subspaces}} argument, but I think what I show below works. Lastly, are the hyperparameters [here|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#gam-hyperparameters] up-to-date? I didn’t see {{spline_orders}} for example, and it was unclear what {{gam_x}} and {{k}} are.

{code:r}packageVersion("h2o")

[1] ‘3.36.1.4’

mt <- as.h2o(mtcars)

mt_grid <- h2o.grid( algorithm = "gam", x = c("cyl", "am"), y = "mpg", training_frame = mt, lambda = 0, keep_gam_cols = TRUE, hyper_params = list( gam_columns = list(c("hp", "drat")), num_knots = list(c(3, 4), c(4, 6)), spline_orders = list(c(2, 3), c(3, 3)), bs = list(c(1, 0)), scale = list(c(0.01, 0.01)) ) )

mt_grid

H2O Grid Details

================

Grid ID: Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251

Used hyper parameters:

- bs

- gam_columns

- num_knots

- scale

- spline_orders

Number of models: 4

Number of failed models: 0

Hyper-Parameter Search Summary: ordered by increasing residual_deviance

bs gam_columns num_knots scale spline_orders model_ids residual_deviance

1 [1, 0] [[Ljava.lang.String;@72ff98fa [4, 6] [0.01, 0.01] [2, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_2 149.78507

2 [1, 0] [[Ljava.lang.String;@4cbbc671 [4, 6] [0.01, 0.01] [3, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_4 149.78507

3 [1, 0] [[Ljava.lang.String;@fb38ed [3, 4] [0.01, 0.01] [2, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_1 165.05367

4 [1, 0] [[Ljava.lang.String;@37e4fb87 [3, 4] [0.01, 0.01] [3, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_3 165.05367{code}

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8794 Assignee: New H2O Bugs Reporter: Paul Donnelly State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

hutch3232 commented 1 year ago

I'm the original reporter on Jira. Reposting below with better formatting.


As can be seen in the grid search below, the formatting of gam_columns in the grid search output is bugged.

Additionally, the documentation shows how to specify hyperparameters with Python syntax, but doesn’t show the corresponding R syntax, I think that’d be a nice addition. It maybe should also be mentioned that argument is applicable to h2o.grid, not h2o.gam. Also, it wasn’t clear how to use the subspaces argument, but I think what I show below works. Lastly, are the hyperparameters [here|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#gam-hyperparameters] up-to-date? I didn’t see spline_orders for example, and it was unclear what gam_x and k are.

packageVersion("h2o")

# [1] ‘3.36.1.4’
mt <- as.h2o(mtcars)

mt_grid <- h2o.grid( algorithm = "gam", x = c("cyl", "am"), y = "mpg", training_frame = mt, lambda = 0, keep_gam_cols = TRUE, hyper_params = list( gam_columns = list(c("hp", "drat")), num_knots = list(c(3, 4), c(4, 6)), spline_orders = list(c(2, 3), c(3, 3)), bs = list(c(1, 0)), scale = list(c(0.01, 0.01)) ) )

mt_grid

# H2O Grid Details
# ================
# Grid ID: Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251
# Used hyper parameters:
# - bs
# - gam_columns
# - num_knots
# - scale
# - spline_orders
# Number of models: 4
# Number of failed models: 0
# Hyper-Parameter Search Summary: ordered by increasing residual_deviance
# bs gam_columns num_knots scale spline_orders model_ids residual_deviance
# 1 [1, 0] [[Ljava.lang.String;@72ff98fa [4, 6] [0.01, 0.01] [2, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_2 149.78507
# 2 [1, 0] [[Ljava.lang.String;@4cbbc671 [4, 6] [0.01, 0.01] [3, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_4 149.78507
# 3 [1, 0] [[Ljava.lang.String;@fb38ed [3, 4] [0.01, 0.01] [2, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_1 165.05367
# 4 [1, 0] [[Ljava.lang.String;@37e4fb87 [3, 4] [0.01, 0.01] [3, 3] Grid_GAM_mtcars_sid_b77f_224_model_R_1659622051297_251_model_3 165.05367