Open exalate-issue-sync[bot] opened 1 year ago
Marc Burgess commented: I experimented with this a bit more. If I run a grid search over both alpha and lambda it doesn’t fail. However, if I turn on remove_collinear_columns, most of the grid search calculations complete but a minority fail with an ArrayIndexOutOfBounds error (I was also using 1 000 000 samples instead of 10 000).
{code}h2ores <- h2o.grid(algorithm="glm", grid_id = "h2o_search", y="y", training_frame=h2odat, family="poisson", interaction_pairs=list(c("v2", "v5"), c("v4", "v3")), nfolds=10, remove_collinear_columns=TRUE, hyper_params=list(alpha=c(0,0.2,0.4,0.6,0.8,1.0), lambda=c(1.0,0.5,0.1,0.01,0.001,0.0001,0.00001,0))){code}
Stack trace for one of the failed grid entries:
{code}Hyper-parameter: alpha, 1 Hyper-parameter: lambda, 1e-04 [2019-08-08 12:09:32] failure_details: NA [2019-08-08 12:09:32] failure_stack_traces: java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at hex.DataInfo.coefNames(DataInfo.java:723) at hex.glm.GLM$GLMDriver.ADMM_solve(GLM.java:636) at hex.glm.GLM$GLMDriver.fitIRLSM(GLM.java:832) at hex.glm.GLM$GLMDriver.fitModel(GLM.java:1098) at hex.glm.GLM$GLMDriver.computeSubmodel(GLM.java:1191) at hex.glm.GLM$GLMDriver.computeImpl(GLM.java:1279) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:222) at hex.glm.GLM$GLMDriver.compute2(GLM.java:588) at water.H2O$H2OCountedCompleter.compute(H2O.java:1417) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104){code}
JIRA Issue Migration Info
Jira Issue: PUBDEV-6757 Assignee: New H2O Bugs Reporter: Marc Burgess State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
I have a problem where a glm fit is failing (from R) if I have lambda_search = TRUE and I include interactions. If I set the solver to COORDINATE_DESCENT_NAIVE then it works alright. The error I am getting is "water.exceptions.H2OConcurrentModificationException: Rollups not possible, because Vec was deleted". On another dataset I am instead getting an ArrayIndexOutOfBoundsException, but it is occurring under the same circumstances. I am not sure if the different error is due to it using a different solver.
I originally encountered it attempting to use h2o.grid, but it happens with plan h2o.glm too.
I did already report this via the gitter chat but thought I should formally create an issue here.
Code that reproduces the issue for me is: {code:R} library(h2o)
set.seed(1234)
v1 <- rnorm(10000,5,10) v2 <- rnorm(10000, 0,1) v3 <- rnorm(10000, 5,11) v4 <- factor(sample(c("A", "B", "C", "D", "E"), 10000, replace=TRUE, prob=c(0.4,0.3,0.1,0.1,0.1))) v5 <- factor(sample(c("F","T"), 10000, replace=TRUE, prob=c(0.7,0.3)))
y <- rpois(10000, exp(0.0001 (5 v1 + ifelse(v5=="T",6,0) v2 + 7 v2 + 3 (v5=="F") + ((v4 == "A") 3 + (v4 == "B") 6 + (v4 == "C") 1 + (v4 == "D") 9 + (v4 == "E") 20) v3 19
h2o.init(nthreads=3, min_mem_size = "8G", enable_assertions=FALSE)
h2odat <- as.h2o(cbind(y, v1, v2, v3, v4, v5))
h2o.glm(y="y", training_frame = h2odat, family="poisson", nfolds=10, interaction_pairs=list(c("v2", "v5"), c("v4", "v3")), lambda_search=TRUE, solver="AUTO") {code}