h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

http://h2o.ai

Apache License 2.0

6.92k stars 2k forks source link

GLRM makes cluster unhealthy #8870

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: I did a git diff between 3.22.1.4 (the last working version) with master of today (Aug 5, 2019) and got the following:

{{diff --git a/h2o-algos/src/main/java/hex/glrm/GLRM.java b/h2o-algos/src/main/java/hex/glrm/GLRM.javaindex 06880124ad..38511d65d8 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRM.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRM.java@@ -563,7 +563,7 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM if (step <= _parms._min_step_size) return true; // Stopped when enough steps and average decrease in objective per iteration < TOLERANCE- return model._output._iterations > 10 && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE;+ return (model._output._iterations >= _parms._max_iterations) && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE; } // Regularized Cholesky decomposition using H2O implementation@@ -2577,4 +2577,4 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM } } }-}\ No newline at end of file+}diff --git a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java b/h2o-algos/src/main/java/hex/glrm/GLRMModel.javaindex 33cd9fb03c..70be9863a2 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRMModel.java@@ -187,12 +187,12 @@ public class GLRMModel extends Model<GLRMModel, GLRMModel.GLRMParameters, GLRMMo super(selfKey, parms, output); } - @Override protected Futures remove_impl( Futures fs ) {- if (_output._init_key != null) _output._init_key.remove(fs);- if (_output._x_factor_key !=null) _output._x_factor_key.remove(fs);- if (_output._representation_key != null) _output._representation_key.remove(fs);+ @Override protected Futures remove_impl(Futures fs, boolean cascade) {+ Keyed.remove(_output._init_key, fs, true);+ Keyed.remove(_output._x_factor_key, fs, true);+ Keyed.remove(_output._representation_key, fs, true); - return super.remove_impl(fs);+ return super.remove_impl(fs, cascade); }}}

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Some of the tests and minor stuff are different too but I do not think it is import:

diff --git a/h2o-r/h2o-package/docs/reference/h2o.glrm.html b/h2o-r/h2o-package/docs/reference/h2o.glrm.html

deleted file mode 100644

index 89577d4e70..0000000000

--- a/h2o-r/h2o-package/docs/reference/h2o.glrm.html

+++ /dev/null

@@ -1,298 +0,0 @@

-

-<!DOCTYPE html>

-Generalized low rank decomposition of an H2O data frame — h2o.glrm • h2o

-

-

-

-

-

-<!--[if lt IE 9]>

-<![endif]-->

H2O.ai

Getting Started

R Reference Guide

-

Generalized low rank decomposition of an H2O data frame
Builds a generalized low rank decomposition of an H2O data frame

h2o.glrm(training_frame, cols = NULL, model_id = NULL,

validation_frame = NULL, ignore_const_cols = TRUE,
score_each_iteration = FALSE, loading_name = NULL, transform = c("NONE",
"STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), k = 1,
loss = c("Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic",
"Periodic"), loss_by_col = c("Quadratic", "Absolute", "Huber", "Poisson",
"Hinge", "Logistic", "Periodic", "Categorical", "Ordinal"),
loss_by_col_idx = NULL, multi_loss = c("Categorical", "Ordinal"),
period = 1, regularization_x = c("None", "Quadratic", "L2", "L1",
"NonNegative", "OneSparse", "UnitOneSparse", "Simplex"),
regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative",
"OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0,
max_iterations = 1000, max_updates = 2000, init_step_size = 1,
min_step_size = 1e-04, seed = -1, init = c("Random", "SVD", "PlusPlus",
"User"), svd_method = c("GramSVD", "Power", "Randomized"), user_y = NULL,
user_x = NULL, expand_user_y = TRUE, impute_original = FALSE,
recover_svd = FALSE, max_runtime_secs = 0)
Arguments

training_frame	Id of the training data frame.
cols	(Optional) A vector containing the data columns on which k-means operates.
model_id	Destination id for this model; auto-generated if not specified.
validation_frame	Id of the validation data frame.
ignore_const_cols	`Logical`. Ignore constant columns. Defaults to TRUE.
score_each_iteration	`Logical`. Whether to score during each iteration of model training. Defaults to FALSE.
loading_name	Frame key to save resulting X
transform	Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". -Defaults to NONE.
k	Rank of matrix approximation Defaults to 1.
loss	Numeric loss function Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", -"Periodic". Defaults to Quadratic.
loss_by_col	Loss function by column (override) Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", -"Logistic", "Periodic", "Categorical", "Ordinal".
loss_by_col_idx	Loss function by column index (override)
multi_loss	Categorical loss function Must be one of: "Categorical", "Ordinal". Defaults to Categorical.
period	Length of period (only used with periodic loss function) Defaults to 1.
regularization_x	Regularization function for X matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", -"OneSparse", "UnitOneSparse", "Simplex". Defaults to None.
regularization_y	Regularization function for Y matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", -"OneSparse", "UnitOneSparse", "Simplex". Defaults to None.
gamma_x	Regularization weight on X matrix Defaults to 0.
gamma_y	Regularization weight on Y matrix Defaults to 0.
max_iterations	Maximum number of iterations Defaults to 1000.
max_updates	Maximum number of updates, defaults to 2*max_iterations Defaults to 2000.
init_step_size	Initial step size Defaults to 1.
min_step_size	Minimum step size Defaults to 0.0001.
seed	Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) -Defaults to -1 (time-based random number).
init	Initialization mode Must be one of: "Random", "SVD", "PlusPlus", "User". Defaults to PlusPlus.
svd_method	Method for computing SVD during initialization (Caution: Randomized is currently experimental and unstable) -Must be one of: "GramSVD", "Power", "Randomized". Defaults to Randomized.
user_y	User-specified initial Y
user_x	User-specified initial X
expand_user_y	`Logical`. Expand categorical columns in user-specified initial Y Defaults to TRUE.
impute_original	`Logical`. Reconstruct original training data by reversing transform Defaults to FALSE.
recover_svd	`Logical`. Recover singular values and eigenvectors of XY Defaults to FALSE.
max_runtime_secs	Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0.

Value
Returns an object of class H2ODimReductionModel.
References
M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). Generalized Low Rank Models\[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department
N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.
See also
h2o.kmeans, h2o.svd, h2o.prcomp
Examples
```
# NOT RUN \{
```

-library(h2o)

-h2o.init()

-ausPath <- system.file("extdata", "australia.csv", package="h2o")

-australia.hex <- h2o.uploadFile(path = ausPath)

-h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",

-gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)

-# }

Contents
- Arguments
- Value
- References
- See also
- Examples

Developed by Tom Kraljevic.

Site built with pkgdown.

diff --git a/h2o-r/h2o-package/R/glrm.R b/h2o-r/h2o-package/R/glrm.R

index b01d6be69b..77b1fc40e8 100644

--- a/h2o-r/h2o-package/R/glrm.R

+++ b/h2o-r/h2o-package/R/glrm.R

@@ -51,7 +51,7 @@

1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department

' N. Halko, P.G. Martinsson, J.A. Tropp. {Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions}[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

' @examples

-#' \donttest{

+#' \dontrun{

' library(h2o)

' h2o.init()

' australia_path <- system.file("extdata", "australia.csv", package = "h2o")

@@ -93,23 +93,11 @@ h2o.glrm <- function(training_frame, cols = NULL,

export_checkpoints_dir = NULL

)

{

Validate required training_frame first and other frame args: should be a valid key or an H2OFrame object
training_frame <- .validate.H2OFrame(training_frame, required=TRUE)
validation_frame <- .validate.H2OFrame(validation_frame)
Required args: training_frame
if (missing(training_frame)) stop("argument 'training_frame' is missing, with no default")
Training_frame must be a key or an H2OFrame object
if (!is.H2OFrame(training_frame))
tryCatch(training_frame <- h2o.getFrame(training_frame),
error = function(err) {
stop("argument 'training_frame' must be a valid H2OFrame or key")
})
Validation_frame must be a key or an H2OFrame object
if (!is.null(validation_frame)) {
if (!is.H2OFrame(validation_frame))
tryCatch(validation_frame <- h2o.getFrame(validation_frame),
error = function(err) {
stop("argument 'validation_frame' must be a valid H2OFrame or key")
})
}
Handle other args

Parameter list to send to model builder

parms <- list()

parms$training_frame <- training_frame

@@ -238,7 +226,7 @@ h2o.glrm <- function(training_frame, cols = NULL,

' training data;

' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

' @examples

-#' \donttest{

+#' \dontrun{

' library(h2o)

' h2o.init()

' iris_hf <- as.h2o(iris)

@@ -270,7 +258,7 @@ h2o.getFrame(key)

' down into the original feature space, where each row is one archetype.

' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

' @examples

-#' \donttest{

+#' \dontrun{

' library(h2o)

' h2o.init()

' iris_hf <- as.h2o(iris)

diff --git a/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py

new file mode 100644

index 0000000000..96a513d339

--- /dev/null

+++ b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py

@@ -0,0 +1,81 @@

+import sys, os

+sys.path.insert(1, "../../../")

+import h2o

+from tests import pyunit_utils

+from h2o.estimators.glrm import H2OGeneralizedLowRankEstimator

+from random import randint

+import re

+import time

+import subprocess

+from subprocess import STDOUT,PIPE

+def glrm_mojo():

h2o.remove_all()
NTESTROWS = 200 # number of test dataset rows
df = pyunit_utils.random_dataset("regression", seed=1234) # generate random dataset
train = df[NTESTROWS:, :]
test = df[:NTESTROWS, :]
x = df.names
transform_types = ["NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"]
transformN = transform_types[randint(0, len(transform_types)-1)]
build a GLRM model with random dataset generated earlier
glrmModel = H2OGeneralizedLowRankEstimator(k=3, transform=transformN, max_iterations=10, seed=1234)
glrmModel.train(x=x, training_frame=train)
glrmTrainFactor = h2o.get_frame(glrmModel._model_json['output']['representation_name'])
assert glrmTrainFactor.nrows==train.nrows, \
"X factor row number {0} should equal training row number {1}.".format(glrmTrainFactor.nrows, train.nrows)
save_GLRM_mojo(glrmModel) # ave mojo model
MOJONAME = pyunit_utils.getMojoName(glrmModel._id)
TMPDIR = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath('file')), "..", "results", MOJONAME))
h2o.download_csv(test[x], os.path.join(TMPDIR, 'in.csv')) # save test file, h2o predict/mojo use same file
test and make sure setting the iteration number did not screw up the prediction
predID, pred_mojo = pyunit_utils.mojo_predict(glrmModel, TMPDIR, MOJONAME, glrmIterNumber=100) # save mojo predict
pred_h2o = h2o.getframe("GLRMLoading"+predID)
print("Comparing mojo x Factor and model x Factor for 100 iterations")
pyunit_utils.compare_frames_local(pred_h2o, pred_mojo, 1, tol=1e-10)
scoring with 2 iterations should be shorter than scoring with 8000 iterations
starttime = time.time()
runMojoPredictOnly(TMPDIR, MOJONAME, glrmIterNumber=8000) # save mojo predict
time1000 = time.time()-starttime
starttime = time.time()
runMojoPredictOnly(TMPDIR, MOJONAME, glrmIterNumber=2) # save mojo predict
time10 = time.time()-starttime
print("Time taken for 2 iterations: {0}s. Time taken for 8000 iterations: {1}s.".format(time10, time1000))

+def save_GLRM_mojo(model):

save model
regex = re.compile("[+\\-* !@#$%^&()={}\\[\\]|;:'\"<>,.?/]")
MOJONAME = regex.sub("_", model._id)
print("Downloading Java prediction model code from H2O")
TMPDIR = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath('file')), "..", "results", MOJONAME))
os.makedirs(TMPDIR)
model.download_mojo(path=TMPDIR) # save mojo
return TMPDIR

+def runMojoPredictOnly(tmpdir, mojoname, glrmIterNumber=100):

outFileName = os.path.join(tmpdir, 'out_mojo.csv')
mojoZip = os.path.join(tmpdir, mojoname) + ".zip"
genJarDir = str.split(str(tmpdir),'/')
genJarDir = '/'.join(genJarDir[0:genJarDir.index('h2o-py')]) # locate directory of genmodel.jar
java_cmd = ["java", "-ea", "-cp", os.path.join(genJarDir, "h2o-assemblies/genmodel/build/libs/genmodel.jar"),
"-Xmx12g", "-XX:MaxPermSize=2g", "-XX:ReservedCodeCacheSize=256m", "hex.genmodel.tools.PredictCsv",
"--input", os.path.join(tmpdir, 'in.csv'), "--output",
outFileName, "--mojo", mojoZip, "--decimal"]
java_cmd.append("--glrmIterNumber")
java_cmd.append(str(glrmIterNumber))
p = subprocess.Popen(java_cmd, stdout=PIPE, stderr=STDOUT)
o, e = p.communicate()

+if name == "main":

pyunit_utils.standalone_test(glrm_mojo)

+else:

glrm_mojo()

diff --git a/h2o-py/h2o/estimators/glrm.py b/h2o-py/h2o/estimators/glrm.py

index 691f2f31c8..b7f2fad39b 100644

--- a/h2o-py/h2o/estimators/glrm.py

+++ b/h2o-py/h2o/estimators/glrm.py

@@ -53,8 +53,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

@training_frame.setter

def training_frame(self, training_frame):

assert_is_type(training_frame, None, H2OFrame)
self._parms["training_frame"] = training_frame
self._parms["training_frame"] = H2OFrame._validate(training_frame, 'training_frame')

@property

@@ -68,8 +67,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

@validation_frame.setter

def validation_frame(self, validation_frame):

assert_is_type(validation_frame, None, H2OFrame)
self._parms["validation_frame"] = validation_frame
self._parms["validation_frame"] = H2OFrame._validate(validation_frame, 'validation_frame')

@property

@@ -417,8 +415,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

@user_y.setter

def user_y(self, user_y):

assert_is_type(user_y, None, H2OFrame)
self._parms["user_y"] = user_y
self._parms["user_y"] = H2OFrame._validate(user_y, 'user_y')

@property

@@ -432,8 +429,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):

@user_x.setter

def user_x(self, user_x):

assert_is_type(user_x, None, H2OFrame)
self._parms["user_x"] = user_x
self._parms["user_x"] = H2OFrame._validate(user_x, 'user_x')

@property

diff --git a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

index fe891bbe78..a84ad57095 100644

--- a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

+++ b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java

@@ -31,7 +31,7 @@ public class GlrmMojoModel extends MojoModel {

public boolean _transposed;

public boolean _reverse_transform;

public double _accuracyEps = 1e-10; // reconstruction accuracy A=X*Y

public int _iterNumber = 100; // maximum number of iterations to perform X update.
public int _iterNumber = 100; // maximum number of iterations to perform X update. Default is 100

// We don't really care about regularization of Y since it is changed during scoring

diff --git a/h2o-docs/src/product/data-science/glrm.rst b/h2o-docs/src/product/data-science/glrm.rst

index f15fb38dce..5c4f0d054b 100644

--- a/h2o-docs/src/product/data-science/glrm.rst

+++ b/h2o-docs/src/product/data-science/glrm.rst

@@ -85,6 +85,8 @@ Defining a GLRM Model

max_runtime_secs <algo-params/max_runtime_secs.html>__: Specify the maximum allowed runtime in seconds for model training. Use 0 to disable.

+- export_checkpoints_dir <algo-params/export_checkpoints_dir.html>__: Specify a directory to which generated models will automatically be exported.

FAQ



diff --git a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java

index 08f88f7f07..62e2d88db4 100644

--- a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java

+++ b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java

@@ -68,9 +68,10 @@ public class GLRMGridTest extends TestUtil \{

Job<Grid> gs = GridSearch.startGridSearch(gridKey, params, hyperParms);

grid = (Grid<GLRMModel.GLRMParameters>) gs.get();

modelKeys\[i] = grid.getModelKeys();

+ final Grid.SearchFailure failures = grid.getFailures();

// Make sure number of produced models match size of specified hyper space

Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize,

- grid.getModelCount() + grid.getFailureCount());

+ grid.getModelCount() + failures.getFailureCount());

//

// Make sure that names of used parameters match

//

@@ -130,8 +131,9 @@ public class GLRMGridTest extends TestUtil \{

final Job<Grid> gs1 = GridSearch.startGridSearch(gridKey, params, hyperParms);

grid = (Grid<GLRMModel.GLRMParameters>) gs1.get();

// Make sure number of produced models match size of specified hyper space

+ Grid.SearchFailure failures = grid.getFailures();

Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize1,

- grid.getModelCount() + grid.getFailureCount());

+ grid.getModelCount() + failures.getFailureCount());

// Make sure that names of used parameters match

String\[] gridHyperNames1 = grid.getHyperNames();

Arrays.sort(gridHyperNames1);

@@ -147,9 +149,10 @@ public class GLRMGridTest extends TestUtil \{

final Job<Grid> gs2 = GridSearch.startGridSearch(gridKey, params, hyperParms);

grid = (Grid<GLRMModel.GLRMParameters>) gs2.get();

// Make sure number of produced models match size of specified hyper space

+ failures = grid.getFailures();

Assert.assertEquals("Size of grid should match to size of hyper space",

hyperSpaceSize1 + hyperSpaceSize2,

- grid.getModelCount() + grid.getFailureCount());

+ grid.getModelCount() + failures.getFailureCount());

// Make sure that names of used parameters match

String\[] gridHyperNames2 = grid.getHyperNames();

Arrays.sort(gridHyperNames2);

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: I did a git diff between version 3.22.1.4 and 3.24.0.1. There are no changes for GLRM. Need to get more info from Donna.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Here are more information on glrm runs:

There are 5855 columns, 1622370 columns, all numeric, about 9GB

GLRM is called with: {"_train":{"name":"fgv_nonplugged_10pct_ratio","type":"Key"},"_valid":null,"_nfolds":0,"_keep_cross_validation_models":true,"_keep_cross_validation_predictions":false,"_keep_cross_validation_fold_assignment":false,"_parallelize_cross_validation":true,"_auto_rebalance":true,"_seed":3939798252320305228,"_fold_assignment":"AUTO","_categorical_encoding":"AUTO","_max_categorical_levels":10,"_distribution":"AUTO","_tweedie_power":1.5,"_quantile_alpha":0.5,"_huber_alpha":0.9,"_ignored_columns":null,"_ignore_const_cols":true,"_weights_column":null,"_offset_column":null,"_fold_column":null,"_check_constant_response":true,"_is_cv_model":false,"_score_each_iteration":false,"_max_runtime_secs":0.0,"_stopping_rounds":0,"_stopping_metric":"AUTO","_stopping_tolerance":0.001,"_response_column":null,"_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_confusion_matrix_size":20,"_checkpoint":null,"_pretrained_autoencoder":null,"_custom_metric_func":null,"_export_checkpoints_dir":null,"_k":30,"_max_iterations":30,"_standardize":true,"_init":"PlusPlus","_user_points":null,"_pred_indicator":true,"_estimate_k":false}

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java availableProcessors: 4 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap totalMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap maxMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java version: Java 1.8.0_212 (from Oracle Corporation) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: JVM launch parameters: [-Djava.net.preferIPv4Stack=true, -Dhadoop.metrics.log.level=WARN, -Xms12g, -Xmx12g, -ea, -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -Dlog4j.defaultInitOverride=true, -Dsys.ai.h2o.automl.xgboost.multinode.enabled=true, -Djava.io.tmpdir=/hdssd01/yarn/nm/usercache/svc_h2odev/appcache/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026/tmp, -Dlog4j.configuration=container-log4j.properties, -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026, -Dyarn.app.container.log.filesize=0, -Dhadoop.root.logger=INFO,CLA, -Dhadoop.root.logfile=syslog] 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: OS version: Linux 3.10.0-957.10.1.el7.x86_64 (amd64) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Machine physical memory: 503.59 GB

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6763 Assignee: Wendy Reporter: Wendy State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A

h2oai / h2o-3

GLRM makes cluster unhealthy #8870

Generalized low rank decomposition of an H2O data frame

Arguments

Value

References

See also

Examples

Contents

' @references M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). {Generalized Low Rank Models}[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department

' N. Halko, P.G. Martinsson, J.A. Tropp. {Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions}[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.

' @examples

' library(h2o)

' h2o.init()

' australia_path <- system.file("extdata", "australia.csv", package = "h2o")

Validate required training_frame first and other frame args: should be a valid key or an H2OFrame object

Required args: training_frame

Training_frame must be a key or an H2OFrame object

Validation_frame must be a key or an H2OFrame object

Handle other args

Parameter list to send to model builder

' training data;

' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

' @examples

' library(h2o)

' h2o.init()

' iris_hf <- as.h2o(iris)

' down into the original feature space, where each row is one archetype.

' @seealso \code{\link{h2o.glrm}} for making an H2ODimReductionModel.

' @examples

' library(h2o)

' h2o.init()

' iris_hf <- as.h2o(iris)

build a GLRM model with random dataset generated earlier

test and make sure setting the iteration number did not screw up the prediction

scoring with 2 iterations should be shorter than scoring with 8000 iterations

save model

There are 5855 columns, 1622370 columns, all numeric, about 9GB