h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

GLRM makes cluster unhealthy #8870

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: I did a git diff between 3.22.1.4 (the last working version) with master of today (Aug 5, 2019) and got the following:

{{diff --git a/h2o-algos/src/main/java/hex/glrm/GLRM.java b/h2o-algos/src/main/java/hex/glrm/GLRM.javaindex 06880124ad..38511d65d8 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRM.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRM.java@@ -563,7 +563,7 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM if (step <= _parms._min_step_size) return true; // Stopped when enough steps and average decrease in objective per iteration < TOLERANCE- return model._output._iterations > 10 && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE;+ return (model._output._iterations >= _parms._max_iterations) && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE; } // Regularized Cholesky decomposition using H2O implementation@@ -2577,4 +2577,4 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM } } }-}\ No newline at end of file+}diff --git a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java b/h2o-algos/src/main/java/hex/glrm/GLRMModel.javaindex 33cd9fb03c..70be9863a2 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRMModel.java@@ -187,12 +187,12 @@ public class GLRMModel extends Model<GLRMModel, GLRMModel.GLRMParameters, GLRMMo super(selfKey, parms, output); } - @Override protected Futures remove_impl( Futures fs ) {- if (_output._init_key != null) _output._init_key.remove(fs);- if (_output._x_factor_key !=null) _output._x_factor_key.remove(fs);- if (_output._representation_key != null) _output._representation_key.remove(fs);+ @Override protected Futures remove_impl(Futures fs, boolean cascade) {+ Keyed.remove(_output._init_key, fs, true);+ Keyed.remove(_output._x_factor_key, fs, true);+ Keyed.remove(_output._representation_key, fs, true); - return super.remove_impl(fs);+ return super.remove_impl(fs, cascade); }}}

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Some of the tests and minor stuff are different too but I do not think it is import:

diff --git a/h2o-r/h2o-package/docs/reference/h2o.glrm.html b/h2o-r/h2o-package/docs/reference/h2o.glrm.html

deleted file mode 100644

index 89577d4e70..0000000000

--- a/h2o-r/h2o-package/docs/reference/h2o.glrm.html

+++ /dev/null

@@ -1,298 +0,0 @@

-<!-- Generated by pkgdown: do not edit by hand -->

-<!DOCTYPE html>

-

-

-

-

-Generalized low rank decomposition of an H2O data frame — h2o.glrm • h2o

-

-<!-- jquery -->

-

-<!-- Bootstrap -->

-

-

-

-

-<!-- Font Awesome icons -->

-

-

-

-<!-- pkgdown -->

-

-

-

-<!-- mathjax -->

-

-

-<!--[if lt IE 9]>

-

-

-<![endif]-->

-

-

  • -

  • -

  • -

  • -

  • -

  • -

  • <!--/.nav-collapse -->

  • <!--/.container -->

  • -<!--/.navbar -->

    -

    -

    -library(h2o)

    -h2o.init()

    -ausPath <- system.file("extdata", "australia.csv", package="h2o")

    -australia.hex <- h2o.uploadFile(path = ausPath)

    -h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",

    -gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)

    -# }

    -

    -

    -

    -

    -

    -

    -

    -

    diff --git a/h2o-r/h2o-package/R/glrm.R b/h2o-r/h2o-package/R/glrm.R

    index b01d6be69b..77b1fc40e8 100644

    --- a/h2o-r/h2o-package/R/glrm.R

    +++ b/h2o-r/h2o-package/R/glrm.R

    @@ -51,7 +51,7 @@

    ' @references M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). {Generalized Low Rank Models}[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department

    ' N. Halko, P.G. Martinsson, J.A. Tropp. {Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions}[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

    ' @examples

    -#' \donttest{

    +#' \dontrun{

    ' library(h2o)

    ' h2o.init()

    ' australia_path <- system.file("extdata", "australia.csv", package = "h2o")

    @@ -93,23 +93,11 @@ h2o.glrm <- function(training_frame, cols = NULL,

    export_checkpoints_dir = NULL

    )

    {

    Parameter list to send to model builder

    parms <- list()

    parms$training_frame <- training_frame

    @@ -238,7 +226,7 @@ h2o.glrm <- function(training_frame, cols = NULL,

    ' training data;

    ' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

    ' @examples

    -#' \donttest{

    +#' \dontrun{

    ' library(h2o)

    ' h2o.init()

    ' iris_hf <- as.h2o(iris)

    @@ -270,7 +258,7 @@ h2o.getFrame(key)

    ' down into the original feature space, where each row is one archetype.

    ' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

    ' @examples

    -#' \donttest{

    +#' \dontrun{

    ' library(h2o)

    ' h2o.init()

    ' iris_hf <- as.h2o(iris)

    diff --git a/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py

    new file mode 100644

    index 0000000000..96a513d339

    --- /dev/null

    +++ b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py

    @@ -0,0 +1,81 @@

    +import sys, os

    +sys.path.insert(1, "../../../")

    +import h2o

    +from tests import pyunit_utils

    +from h2o.estimators.glrm import H2OGeneralizedLowRankEstimator

    +from random import randint

    +import re

    +import time

    +import subprocess

    +from subprocess import STDOUT,PIPE

    +

    +

    +def glrm_mojo():

    +def save_GLRM_mojo(model):

    +def runMojoPredictOnly(tmpdir, mojoname, glrmIterNumber=100):

    +if name == "main":

    +else:

    diff --git a/h2o-py/h2o/estimators/glrm.py b/h2o-py/h2o/estimators/glrm.py

    index 691f2f31c8..b7f2fad39b 100644

    --- a/h2o-py/h2o/estimators/glrm.py

    +++ b/h2o-py/h2o/estimators/glrm.py

    @@ -53,8 +53,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

    @training_frame.setter

    def training_frame(self, training_frame):

    @property

    @@ -68,8 +67,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

    @validation_frame.setter

    def validation_frame(self, validation_frame):

    @property

    @@ -417,8 +415,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

    @user_y.setter

    def user_y(self, user_y):

    @property

    @@ -432,8 +429,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

    @user_x.setter

    def user_x(self, user_x):

    @property

    diff --git a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

    index fe891bbe78..a84ad57095 100644

    --- a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

    +++ b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

    @@ -31,7 +31,7 @@ public class GlrmMojoModel extends MojoModel {

    public boolean _transposed;

    public boolean _reverse_transform;

    public double _accuracyEps = 1e-10; // reconstruction accuracy A=X*Y

    // We don't really care about regularization of Y since it is changed during scoring

    diff --git a/h2o-docs/src/product/data-science/glrm.rst b/h2o-docs/src/product/data-science/glrm.rst

    index f15fb38dce..5c4f0d054b 100644

    --- a/h2o-docs/src/product/data-science/glrm.rst

    +++ b/h2o-docs/src/product/data-science/glrm.rst

    @@ -85,6 +85,8 @@ Defining a GLRM Model

    +- export_checkpoints_dir <algo-params/export_checkpoints_dir.html>__: Specify a directory to which generated models will automatically be exported.

    +

    FAQ

    
    
    diff --git a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
    
    index 08f88f7f07..62e2d88db4 100644
    
    --- a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
    
    +++ b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
    
    @@ -68,9 +68,10 @@ public class GLRMGridTest extends TestUtil \{
    
    Job<Grid> gs = GridSearch.startGridSearch(gridKey, params, hyperParms);
    
    grid = (Grid<GLRMModel.GLRMParameters>) gs.get();
    
    modelKeys\[i] = grid.getModelKeys();
    
    + final Grid.SearchFailure failures = grid.getFailures();
    
    // Make sure number of produced models match size of specified hyper space
    
    Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize,
    
    - grid.getModelCount() + grid.getFailureCount());
    
    + grid.getModelCount() + failures.getFailureCount());
    
    //
    
    // Make sure that names of used parameters match
    
    //
    
    @@ -130,8 +131,9 @@ public class GLRMGridTest extends TestUtil \{
    
    final Job<Grid> gs1 = GridSearch.startGridSearch(gridKey, params, hyperParms);
    
    grid = (Grid<GLRMModel.GLRMParameters>) gs1.get();
    
    // Make sure number of produced models match size of specified hyper space
    
    + Grid.SearchFailure failures = grid.getFailures();
    
    Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize1,
    
    - grid.getModelCount() + grid.getFailureCount());
    
    + grid.getModelCount() + failures.getFailureCount());
    
    // Make sure that names of used parameters match
    
    String\[] gridHyperNames1 = grid.getHyperNames();
    
    Arrays.sort(gridHyperNames1);
    
    @@ -147,9 +149,10 @@ public class GLRMGridTest extends TestUtil \{
    
    final Job<Grid> gs2 = GridSearch.startGridSearch(gridKey, params, hyperParms);
    
    grid = (Grid<GLRMModel.GLRMParameters>) gs2.get();
    
    // Make sure number of produced models match size of specified hyper space
    
    + failures = grid.getFailures();
    
    Assert.assertEquals("Size of grid should match to size of hyper space",
    
    hyperSpaceSize1 + hyperSpaceSize2,
    
    - grid.getModelCount() + grid.getFailureCount());
    
    + grid.getModelCount() + failures.getFailureCount());
    
    // Make sure that names of used parameters match
    
    String\[] gridHyperNames2 = grid.getHyperNames();
    
    Arrays.sort(gridHyperNames2);
    exalate-issue-sync[bot] commented 1 year ago

    Wendy commented: I did a git diff between version 3.22.1.4 and 3.24.0.1. There are no changes for GLRM. Need to get more info from Donna.

    exalate-issue-sync[bot] commented 1 year ago

    Wendy commented: Here are more information on glrm runs:

    There are 5855 columns, 1622370 columns, all numeric, about 9GB

    GLRM is called with: {"_train":{"name":"fgv_nonplugged_10pct_ratio","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":3939798252320305228,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":null,"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":0,"_stopping_metric":"AUTO","_stopping_tolerance":0.001,"_response_column":null,"_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_export_checkpoints_dir":null,"_k":30,"_max_iterations":30,"_standardize":true,"_init":"PlusPlus","_user_points":null,"_pred_indicator":true,"_estimate_k":false}

    exalate-issue-sync[bot] commented 1 year ago

    Wendy commented: 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java availableProcessors: 4 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap totalMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap maxMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java version: Java 1.8.0_212 (from Oracle Corporation) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: JVM launch parameters: [-Djava.net.preferIPv4Stack=true, -Dhadoop.metrics.log.level=WARN, -Xms12g, -Xmx12g, -ea, -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -Dlog4j.defaultInitOverride=true, -Dsys.ai.h2o.automl.xgboost.multinode.enabled=true, -Djava.io.tmpdir=/hdssd01/yarn/nm/usercache/svc_h2odev/appcache/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026/tmp, -Dlog4j.configuration=container-log4j.properties, -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026, -Dyarn.app.container.log.filesize=0, -Dhadoop.root.logger=INFO,CLA, -Dhadoop.root.logfile=syslog] 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: OS version: Linux 3.10.0-957.10.1.el7.x86_64 (amd64) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Machine physical memory: 503.59 GB

    h2o-ops commented 1 year ago

    JIRA Issue Migration Info

    Jira Issue: PUBDEV-6763 Assignee: Wendy Reporter: Wendy State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A