h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.84k stars 2k forks source link

glm multinomial fails with nfold or fold column cross validation #15403

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Happens with Tibshirani 8 and latest nightly (3.7.0.3300)

I manually generated a fold column, 'x'. I tried nfolds=10 also and same results (I wanted to do both lambda search and nfold cross validation, so I'm managing both myself). There are 7381 observations/rows in the data.

{code:java} model <- h2o.glm(x=x.columns, y=y.column.h2o, training_frame = h2o.train, family="multinomial", alpha = 1, lambda = lvalue, fold_column = fold.column) {code}

{noformat} Got exception 'class java.lang.AssertionError', with msg '7381 != 6643' java.lang.AssertionError: 7381 != 6643 at hex.glm.GLM$GLMSingleLambdaTsk$MultinomialLineSearchIteration.callback(GLM.java:2141) at hex.glm.GLM$GLMSingleLambdaTsk$MultinomialLineSearchIteration.callback(GLM.java:2124) at water.H2O$H2OCallback.onCompletion(H2O.java:1116) at jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:383) at water.MRTask.compute2(MRTask.java:689) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1060) at hex.glm.GLMTask$GLMMultinomialLineSearchTask$Icer.compute1(GLMTask$GLMMultinomialLineSearchTask$Icer.java) at water.H2O$H2OCountedCompleter.compute(H2O.java:1056) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) {noformat}

exalate-issue-sync[bot] commented 1 year ago

Sander Maijers commented: I have the same error, with and without a custom fold column, but with cross-validation in general.

DinukaH2O commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2498 Assignee: Brandon Hill Reporter: Former user State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A