Closed exalate-issue-sync[bot] closed 1 year ago
Wendy commented: Some of the tests and minor stuff are different too but I do not think it is import:
diff --git a/h2o-r/h2o-package/docs/reference/h2o.glrm.html b/h2o-r/h2o-package/docs/reference/h2o.glrm.html
deleted file mode 100644
index 89577d4e70..0000000000
--- a/h2o-r/h2o-package/docs/reference/h2o.glrm.html
+++ /dev/null
@@ -1,298 +0,0 @@
-<!-- Generated by pkgdown: do not edit by hand -->
-<!DOCTYPE html>
-
-
-
-
-
-
-<!-- jquery -->
-
-<!-- Bootstrap -->
-
-
-
-
-<!-- Font Awesome icons -->
-
-
-
-<!-- pkgdown -->
-
-
-
-<!-- mathjax -->
-
-
-<!--[if lt IE 9]>
-
-
-<![endif]-->
-
-
-
-
Builds a generalized low rank decomposition of an H2O data frame
h2o.glrm(training_frame, cols = NULL, model_id = NULL,
validation_frame = NULL, ignore_const_cols = TRUE,
score_each_iteration = FALSE, loading_name = NULL, transform = c("NONE",
"STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"), k = 1,
loss = c("Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic",
"Periodic"), loss_by_col = c("Quadratic", "Absolute", "Huber", "Poisson",
"Hinge", "Logistic", "Periodic", "Categorical", "Ordinal"),
loss_by_col_idx = NULL, multi_loss = c("Categorical", "Ordinal"),
period = 1, regularization_x = c("None", "Quadratic", "L2", "L1",
"NonNegative", "OneSparse", "UnitOneSparse", "Simplex"),
regularization_y = c("None", "Quadratic", "L2", "L1", "NonNegative",
"OneSparse", "UnitOneSparse", "Simplex"), gamma_x = 0, gamma_y = 0,
max_iterations = 1000, max_updates = 2000, init_step_size = 1,
min_step_size = 1e-04, seed = -1, init = c("Random", "SVD", "PlusPlus",
"User"), svd_method = c("GramSVD", "Power", "Randomized"), user_y = NULL,
user_x = NULL, expand_user_y = TRUE, impute_original = FALSE,
recover_svd = FALSE, max_runtime_secs = 0)
training_frame | Id of the training data frame. |
---|---|
cols | (Optional) A vector containing the data columns on which k-means operates. |
model_id | Destination id for this model; auto-generated if not specified. |
validation_frame | Id of the validation data frame. |
ignore_const_cols |
|
score_each_iteration |
|
loading_name | Frame key to save resulting X |
transform | Transformation of training data Must be one of: "NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE". -Defaults to NONE. |
k | Rank of matrix approximation Defaults to 1. |
loss | Numeric loss function Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", "Logistic", -"Periodic". Defaults to Quadratic. |
loss_by_col | Loss function by column (override) Must be one of: "Quadratic", "Absolute", "Huber", "Poisson", "Hinge", -"Logistic", "Periodic", "Categorical", "Ordinal". |
loss_by_col_idx | Loss function by column index (override) |
multi_loss | Categorical loss function Must be one of: "Categorical", "Ordinal". Defaults to Categorical. |
period | Length of period (only used with periodic loss function) Defaults to 1. |
regularization_x | Regularization function for X matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", -"OneSparse", "UnitOneSparse", "Simplex". Defaults to None. |
regularization_y | Regularization function for Y matrix Must be one of: "None", "Quadratic", "L2", "L1", "NonNegative", -"OneSparse", "UnitOneSparse", "Simplex". Defaults to None. |
gamma_x | Regularization weight on X matrix Defaults to 0. |
gamma_y | Regularization weight on Y matrix Defaults to 0. |
max_iterations | Maximum number of iterations Defaults to 1000. |
max_updates | Maximum number of updates, defaults to 2*max_iterations Defaults to 2000. |
init_step_size | Initial step size Defaults to 1. |
min_step_size | Minimum step size Defaults to 0.0001. |
seed | Seed for random numbers (affects certain parts of the algo that are stochastic and those might or might not be enabled by default) -Defaults to -1 (time-based random number). |
init | Initialization mode Must be one of: "Random", "SVD", "PlusPlus", "User". Defaults to PlusPlus. |
svd_method | Method for computing SVD during initialization (Caution: Randomized is currently experimental and unstable) -Must be one of: "GramSVD", "Power", "Randomized". Defaults to Randomized. |
user_y | User-specified initial Y |
user_x | User-specified initial X |
expand_user_y |
|
impute_original |
|
recover_svd |
|
max_runtime_secs | Maximum allowed runtime in seconds for model training. Use 0 to disable. Defaults to 0. |
Returns an object of class H2ODimReductionModel.
M. Udell, C. Horn, R. Zadeh, S. Boyd (2014). Generalized Low Rank Models\[http://arxiv.org/abs/1410.0342]. Unpublished manuscript, Stanford Electrical Engineering Department
N. Halko, P.G. Martinsson, J.A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions[http://arxiv.org/abs/0909.4061]. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011.
# NOT RUN \{
-library(h2o)
-h2o.init()
-ausPath <- system.file("extdata", "australia.csv", package="h2o")
-australia.hex <- h2o.uploadFile(path = ausPath)
-h2o.glrm(training_frame = australia.hex, k = 5, loss = "Quadratic", regularization_x = "L1",
-gamma_x = 0.5, gamma_y = 0, max_iterations = 1000)
-# }
-
-
Developed by Tom Kraljevic.
-
-
-
-
-
-
diff --git a/h2o-r/h2o-package/R/glrm.R b/h2o-r/h2o-package/R/glrm.R
index b01d6be69b..77b1fc40e8 100644
--- a/h2o-r/h2o-package/R/glrm.R
+++ b/h2o-r/h2o-package/R/glrm.R
@@ -51,7 +51,7 @@
-#' \donttest{
+#' \dontrun{
@@ -93,23 +93,11 @@ h2o.glrm <- function(training_frame, cols = NULL,
export_checkpoints_dir = NULL
)
{
training_frame <- .validate.H2OFrame(training_frame, required=TRUE)
validation_frame <- .validate.H2OFrame(validation_frame)
if (missing(training_frame)) stop("argument 'training_frame' is missing, with no default")
if (!is.H2OFrame(training_frame))
tryCatch(training_frame <- h2o.getFrame(training_frame),
error = function(err) {
stop("argument 'training_frame' must be a valid H2OFrame or key")
})
if (!is.null(validation_frame)) {
if (!is.H2OFrame(validation_frame))
tryCatch(validation_frame <- h2o.getFrame(validation_frame),
error = function(err) {
stop("argument 'validation_frame' must be a valid H2OFrame or key")
})
}
parms <- list()
parms$training_frame <- training_frame
@@ -238,7 +226,7 @@ h2o.glrm <- function(training_frame, cols = NULL,
-#' \donttest{
+#' \dontrun{
@@ -270,7 +258,7 @@ h2o.getFrame(key)
-#' \donttest{
+#' \dontrun{
diff --git a/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py
new file mode 100644
index 0000000000..96a513d339
--- /dev/null
+++ b/h2o-py/tests/testdir_javapredict/pyunit_pubdev_5858_GLRMIterNumber.py
@@ -0,0 +1,81 @@
+import sys, os
+sys.path.insert(1, "../../../")
+import h2o
+from tests import pyunit_utils
+from h2o.estimators.glrm import H2OGeneralizedLowRankEstimator
+from random import randint
+import re
+import time
+import subprocess
+from subprocess import STDOUT,PIPE
+
+
+def glrm_mojo():
h2o.remove_all()
NTESTROWS = 200 # number of test dataset rows
df = pyunit_utils.random_dataset("regression", seed=1234) # generate random dataset
train = df[NTESTROWS:, :]
test = df[:NTESTROWS, :]
x = df.names
transform_types = ["NONE", "STANDARDIZE", "NORMALIZE", "DEMEAN", "DESCALE"]
transformN = transform_types[randint(0, len(transform_types)-1)]
glrmModel = H2OGeneralizedLowRankEstimator(k=3, transform=transformN, max_iterations=10, seed=1234)
glrmModel.train(x=x, training_frame=train)
glrmTrainFactor = h2o.get_frame(glrmModel._model_json['output']['representation_name'])
assert glrmTrainFactor.nrows==train.nrows, \
"X factor row number {0} should equal training row number {1}.".format(glrmTrainFactor.nrows, train.nrows)
save_GLRM_mojo(glrmModel) # ave mojo model
MOJONAME = pyunit_utils.getMojoName(glrmModel._id)
TMPDIR = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath('file')), "..", "results", MOJONAME))
h2o.download_csv(test[x], os.path.join(TMPDIR, 'in.csv')) # save test file, h2o predict/mojo use same file
predID, pred_mojo = pyunit_utils.mojo_predict(glrmModel, TMPDIR, MOJONAME, glrmIterNumber=100) # save mojo predict
pred_h2o = h2o.getframe("GLRMLoading"+predID)
print("Comparing mojo x Factor and model x Factor for 100 iterations")
pyunit_utils.compare_frames_local(pred_h2o, pred_mojo, 1, tol=1e-10)
starttime = time.time()
runMojoPredictOnly(TMPDIR, MOJONAME, glrmIterNumber=8000) # save mojo predict
time1000 = time.time()-starttime
starttime = time.time()
runMojoPredictOnly(TMPDIR, MOJONAME, glrmIterNumber=2) # save mojo predict
time10 = time.time()-starttime
print("Time taken for 2 iterations: {0}s. Time taken for 8000 iterations: {1}s.".format(time10, time1000))
+def save_GLRM_mojo(model):
regex = re.compile("[+\\-* !@#$%^&()={}\\[\\]|;:'\"<>,.?/]")
MOJONAME = regex.sub("_", model._id)
print("Downloading Java prediction model code from H2O")
TMPDIR = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath('file')), "..", "results", MOJONAME))
os.makedirs(TMPDIR)
model.download_mojo(path=TMPDIR) # save mojo
return TMPDIR
+def runMojoPredictOnly(tmpdir, mojoname, glrmIterNumber=100):
outFileName = os.path.join(tmpdir, 'out_mojo.csv')
mojoZip = os.path.join(tmpdir, mojoname) + ".zip"
genJarDir = str.split(str(tmpdir),'/')
genJarDir = '/'.join(genJarDir[0:genJarDir.index('h2o-py')]) # locate directory of genmodel.jar
java_cmd = ["java", "-ea", "-cp", os.path.join(genJarDir, "h2o-assemblies/genmodel/build/libs/genmodel.jar"),
"-Xmx12g", "-XX:MaxPermSize=2g", "-XX:ReservedCodeCacheSize=256m", "hex.genmodel.tools.PredictCsv",
"--input", os.path.join(tmpdir, 'in.csv'), "--output",
outFileName, "--mojo", mojoZip, "--decimal"]
java_cmd.append("--glrmIterNumber")
java_cmd.append(str(glrmIterNumber))
p = subprocess.Popen(java_cmd, stdout=PIPE, stderr=STDOUT)
o, e = p.communicate()
+if name == "main":
+else:
diff --git a/h2o-py/h2o/estimators/glrm.py b/h2o-py/h2o/estimators/glrm.py
index 691f2f31c8..b7f2fad39b 100644
--- a/h2o-py/h2o/estimators/glrm.py
+++ b/h2o-py/h2o/estimators/glrm.py
@@ -53,8 +53,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):
@training_frame.setter
def training_frame(self, training_frame):
assert_is_type(training_frame, None, H2OFrame)
self._parms["training_frame"] = training_frame
self._parms["training_frame"] = H2OFrame._validate(training_frame, 'training_frame')
@property
@@ -68,8 +67,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):
@validation_frame.setter
def validation_frame(self, validation_frame):
assert_is_type(validation_frame, None, H2OFrame)
self._parms["validation_frame"] = validation_frame
self._parms["validation_frame"] = H2OFrame._validate(validation_frame, 'validation_frame')
@property
@@ -417,8 +415,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):
@user_y.setter
def user_y(self, user_y):
assert_is_type(user_y, None, H2OFrame)
self._parms["user_y"] = user_y
self._parms["user_y"] = H2OFrame._validate(user_y, 'user_y')
@property
@@ -432,8 +429,7 @@ class H2OGeneralizedLowRankEstimator(H2OEstimator):
@user_x.setter
def user_x(self, user_x):
assert_is_type(user_x, None, H2OFrame)
self._parms["user_x"] = user_x
self._parms["user_x"] = H2OFrame._validate(user_x, 'user_x')
@property
diff --git a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java
index fe891bbe78..a84ad57095 100644
--- a/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java
+++ b/h2o-genmodel/src/main/java/hex/genmodel/algos/glrm/GlrmMojoModel.java
@@ -31,7 +31,7 @@ public class GlrmMojoModel extends MojoModel {
public boolean _transposed;
public boolean _reverse_transform;
public double _accuracyEps = 1e-10; // reconstruction accuracy A=X*Y
public int _iterNumber = 100; // maximum number of iterations to perform X update.
public int _iterNumber = 100; // maximum number of iterations to perform X update. Default is 100
// We don't really care about regularization of Y since it is changed during scoring
diff --git a/h2o-docs/src/product/data-science/glrm.rst b/h2o-docs/src/product/data-science/glrm.rst
index f15fb38dce..5c4f0d054b 100644
--- a/h2o-docs/src/product/data-science/glrm.rst
+++ b/h2o-docs/src/product/data-science/glrm.rst
@@ -85,6 +85,8 @@ Defining a GLRM Model
max_runtime_secs <algo-params/max_runtime_secs.html>
__: Specify the maximum allowed runtime in seconds for model training. Use 0 to disable.+- export_checkpoints_dir <algo-params/export_checkpoints_dir.html>
__: Specify a directory to which generated models will automatically be exported.
+
FAQ
diff --git a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
index 08f88f7f07..62e2d88db4 100644
--- a/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
+++ b/h2o-algos/src/test/java/hex/glrm/GLRMGridTest.java
@@ -68,9 +68,10 @@ public class GLRMGridTest extends TestUtil \{
Job<Grid> gs = GridSearch.startGridSearch(gridKey, params, hyperParms);
grid = (Grid<GLRMModel.GLRMParameters>) gs.get();
modelKeys\[i] = grid.getModelKeys();
+ final Grid.SearchFailure failures = grid.getFailures();
// Make sure number of produced models match size of specified hyper space
Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize,
- grid.getModelCount() + grid.getFailureCount());
+ grid.getModelCount() + failures.getFailureCount());
//
// Make sure that names of used parameters match
//
@@ -130,8 +131,9 @@ public class GLRMGridTest extends TestUtil \{
final Job<Grid> gs1 = GridSearch.startGridSearch(gridKey, params, hyperParms);
grid = (Grid<GLRMModel.GLRMParameters>) gs1.get();
// Make sure number of produced models match size of specified hyper space
+ Grid.SearchFailure failures = grid.getFailures();
Assert.assertEquals("Size of grid should match to size of hyper space", hyperSpaceSize1,
- grid.getModelCount() + grid.getFailureCount());
+ grid.getModelCount() + failures.getFailureCount());
// Make sure that names of used parameters match
String\[] gridHyperNames1 = grid.getHyperNames();
Arrays.sort(gridHyperNames1);
@@ -147,9 +149,10 @@ public class GLRMGridTest extends TestUtil \{
final Job<Grid> gs2 = GridSearch.startGridSearch(gridKey, params, hyperParms);
grid = (Grid<GLRMModel.GLRMParameters>) gs2.get();
// Make sure number of produced models match size of specified hyper space
+ failures = grid.getFailures();
Assert.assertEquals("Size of grid should match to size of hyper space",
hyperSpaceSize1 + hyperSpaceSize2,
- grid.getModelCount() + grid.getFailureCount());
+ grid.getModelCount() + failures.getFailureCount());
// Make sure that names of used parameters match
String\[] gridHyperNames2 = grid.getHyperNames();
Arrays.sort(gridHyperNames2);
Wendy commented: I did a git diff between version 3.22.1.4 and 3.24.0.1. There are no changes for GLRM. Need to get more info from Donna.
Wendy commented: Here are more information on glrm runs:
Wendy commented: 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java availableProcessors: 4 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap totalMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java heap maxMemory: 11.50 GB 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Java version: Java 1.8.0_212 (from Oracle Corporation) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: JVM launch parameters: [-Djava.net.preferIPv4Stack=true, -Dhadoop.metrics.log.level=WARN, -Xms12g, -Xmx12g, -ea, -verbose:gc, -XX:+PrintGCDetails, -XX:+PrintGCTimeStamps, -Dlog4j.defaultInitOverride=true, -Dsys.ai.h2o.automl.xgboost.multinode.enabled=true, -Djava.io.tmpdir=/hdssd01/yarn/nm/usercache/svc_h2odev/appcache/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026/tmp, -Dlog4j.configuration=container-log4j.properties, -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1561214520549_4438/container_e280_1561214520549_4438_01_000026, -Dyarn.app.container.log.filesize=0, -Dhadoop.root.logger=INFO,CLA, -Dhadoop.root.logfile=syslog] 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: OS version: Linux 3.10.0-957.10.1.el7.x86_64 (amd64) 07-26 08:17:40.168 10.20.33.76:55099 33517 main INFO: Machine physical memory: 503.59 GB
JIRA Issue Migration Info
Jira Issue: PUBDEV-6763 Assignee: Wendy Reporter: Wendy State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A
Wendy commented: I did a git diff between 3.22.1.4 (the last working version) with master of today (Aug 5, 2019) and got the following:
{{diff --git a/h2o-algos/src/main/java/hex/glrm/GLRM.java b/h2o-algos/src/main/java/hex/glrm/GLRM.javaindex 06880124ad..38511d65d8 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRM.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRM.java@@ -563,7 +563,7 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM if (step <= _parms._min_step_size) return true; // Stopped when enough steps and average decrease in objective per iteration < TOLERANCE- return model._output._iterations > 10 && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE;+ return (model._output._iterations >= _parms._max_iterations) && steps_in_row > 3 && Math.abs(model._output._avg_change_obj) < TOLERANCE; } // Regularized Cholesky decomposition using H2O implementation@@ -2577,4 +2577,4 @@ public class GLRM extends ModelBuilder<GLRMModel, GLRMModel.GLRMParameters, GLRM } } }-}\ No newline at end of file+}diff --git a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java b/h2o-algos/src/main/java/hex/glrm/GLRMModel.javaindex 33cd9fb03c..70be9863a2 100644--- a/h2o-algos/src/main/java/hex/glrm/GLRMModel.java+++ b/h2o-algos/src/main/java/hex/glrm/GLRMModel.java@@ -187,12 +187,12 @@ public class GLRMModel extends Model<GLRMModel, GLRMModel.GLRMParameters, GLRMMo super(selfKey, parms, output); } - @Override protected Futures remove_impl( Futures fs ) {- if (_output._init_key != null) _output._init_key.remove(fs);- if (_output._x_factor_key !=null) _output._x_factor_key.remove(fs);- if (_output._representation_key != null) _output._representation_key.remove(fs);+ @Override protected Futures remove_impl(Futures fs, boolean cascade) {+ Keyed.remove(_output._init_key, fs, true);+ Keyed.remove(_output._x_factor_key, fs, true);+ Keyed.remove(_output._representation_key, fs, true); - return super.remove_impl(fs);+ return super.remove_impl(fs, cascade); }}}