h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Fix display of array-valued entries in TwoDimTables such as grid search results #10500

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

This is how grid two dim tables look like in Java and R when a parameter is an array:

{code} Hyper-Parameter Search Summary: ordered by increasing logloss activation hidden model_ids logloss 1 Rectifier [I@72260a7f deepwater_grid_model_1 0.23998280650670897 2 Tanh [I@59c1aeb2 deepwater_grid_model_0 0.2720063731222653 3 Tanh [I@4fb8d080 deepwater_grid_model_2 0.32258577831421775 4 Rectifier [I@99673ce deepwater_grid_model_3 1.2485071090736057 {code}

Note: Python has a separate bug for this, and it looks wrong too: https://0xdata.atlassian.net/browse/PUBDEV-3592

Two Dim Tables should print out arrays properly and this fix is needed in the backend:

{code} diff --git a/h2o-core/src/main/java/water/util/TwoDimTable.java b/h2o-core/src/main/java/water/util/TwoDimTable.java index 53b48a8..43a3e58 100644 --- a/h2o-core/src/main/java/water/util/TwoDimTable.java +++ b/h2o-core/src/main/java/water/util/TwoDimTable.java @@ -229,6 +229,8 @@ public class TwoDimTable extends Iced { cellValues[row][col] = new IcedWrapper(null); else if (o instanceof Double && Double.isNaN((double)o)) cellValues[row][col] = new IcedWrapper(Double.NaN);

Then the grid TwoDimTable is fine: {code} INFO: GET /99/Grids/deepwater_grid, parms: {} INFO: Hyper-Parameter Search Summary (ordered by increasing logloss): INFO: activation hidden learning_rate model_ids logloss INFO: Rectifier [20, 20] 0.0031 deepwater_grid_model_0 0.2245598137094568 INFO: Rectifier [50, 50, 50] 0.0081 deepwater_grid_model_1 1.1129464114125345 {code}

But then grids with int[] in the parameters (such as hidden) look like this in R:

{code}

h2o.getGrid("mygrid") H2O Grid Details

Grid ID: mygrid Used hyper parameters:

Hyper-Parameter Search Summary: ordered by increasing logloss hidden loss , , , c(20, 20), c(50, 50, 50), c(20, 20), c(50, 50, 50) Quadratic, Quadratic, CrossEntropy, CrossEntropy model_ids logloss , , , mygrid_model_0, mygrid_model_1, mygrid_model_2, mygrid_model_3 0.0992450185385188, 0.10615555194039068, 0.15285186191991842, 0.33590412228168504 {code}

The issue is this code: https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/communication.R#L401-L454

How to reproduce:

Apply the above patch to Java and run this code in R: {code} library(h2o) h2o.init() train <- as.h2o(iris) predictors=1:4 response_col=5 m <- h2o.deepwater(x=predictors,y=response_col,training_frame=df) m

hidden_opts <- list(c(20, 20), c(50, 50, 50), c(50,50,50,50,50)) activation_opts <- c("tanh", "rectifier") learnrate_opts <- seq(1e-4, 1e-2, 1e-3) momentum_opts <- seq(0, 1, 1e-3) max_models <- 5 nfolds <- 2 seed <- 421 max_runtime_secs <- 20

hyper_params <- list(activation = activation_opts, hidden = hidden_opts, rate = learnrate_opts) search_criteria = list(strategy = "RandomDiscrete", max_models = max_models, seed = seed, max_runtime_secs = max_runtime_secs, stopping_rounds=5, ## enable early stopping of the overall leaderboard stopping_metric="logloss", stopping_tolerance=1e-4)

dw_grid = h2o.grid("deeplearning", grid_id="grid", x=predictors, y=response_col, training_frame=train, epochs=5, ## long enough to allow early stopping nfolds=nfolds, stopping_rounds=3, ## enable early stopping of each model in the hyperparameter search stopping_metric="logloss", stopping_tolerance=1e-3, ## stop once validation logloss of the cv models doesn't improve enough hyper_params=hyper_params, search_criteria = search_criteria)

h2o.getGrid("grid") {code}

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: Issue was previously reported as: https://0xdata.atlassian.net/browse/PUBDEV-3554 But I'll close that ticket as a duplicate.

exalate-issue-sync[bot] commented 1 year ago

Arno Candel commented: Fixed by https://github.com/h2oai/h2o-3/pull/396 {code} Grid ID: grid Used hyper parameters:

Hyper-Parameter Search Summary: ordered by increasing logloss activation hidden rate model_ids logloss 1 Tanh [50, 50, 50] 0.0061 grid_model_1 0.15752567956543098 2 Rectifier [50, 50, 50] 1.0E-4 grid_model_4 0.16815431186904875 3 Rectifier [20, 20] 0.0021 grid_model_0 0.23909406594950128 4 Rectifier [50, 50, 50, 50, 50] 0.0081 grid_model_2 0.2627626972723024 5 Rectifier [50, 50, 50, 50, 50] 1.0E-4 grid_model_3 0.5878911990473336 {code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3593 Assignee: Arno Candel Reporter: Arno Candel State: Resolved Fix Version: 3.10.0.9 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/396