h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

scoring with interaction pairs not working #7527

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Run this code and you will run into error:

import sys, os sys.path.insert(1, "../../../") import h2o from tests import pyunit_utils import tempfile

def glm_mojo_all_interaction_test_large(): seed = 12345 bigCat = pyunit_utils.random_dataset_enums_only(10000, 1, factorL=30, misFrac=0.01, randSeed=seed) bitCat2 = pyunit_utils.random_dataset_enums_only(10000, 1, factorL=20, misFrac=0.01, randSeed=seed) smallCats = pyunit_utils.random_dataset_enums_only(10000, 4, factorL=5, misFrac=0.01, randSeed=seed) numerics = pyunit_utils.random_dataset_real_only(10000, 5, realR=100, misFrac=0.01, randSeed=seed) dataframe = numerics.cbind(smallCats.cbind(bitCat2.cbind(bigCat))) dataframe.set_names(["response","n1","n2","n3","n4","c1","c2","c3","c4","c5","c6"]) xcols = ["n1","n2","n3","n4","c1","c2","c3","c4","c5","c6"] interaction_pairs = [("c1", "n1"), ("c5", "n2"), ("c1", "c2"), ("c3", "c5"), ("n3", "n4")] params = {'family':"gaussian", 'lambda_search':False, 'interaction_pairs':interaction_pairs, 'standardize':False}

TMPDIR = tempfile.mkdtemp()
glmGaussianModel = pyunit_utils.build_save_model_generic(params, xcols, dataframe, "response", "glm", TMPDIR) # build and save mojo model
MOJONAME = pyunit_utils.getMojoName(glmGaussianModel._id)
splitFrame = dataframe.split_frame(ratios=[0.001], seed=seed)
h2o.download_csv(splitFrame[0], os.path.join(TMPDIR, 'in.csv'))
newTest = h2o.import_file(os.path.join(TMPDIR, 'in.csv'), header=1)   # Make sure h2o and mojo use same in.csv
predict_h2o = glmGaussianModel.predict(newTest)

if name == "main": pyunit_utils.standalone_test(glm_mojo_all_interaction_test_large) else: glm_mojo_all_interaction_test_large()

Error: 04-23 15:28:58.212 192.168.86.41:54321 6685 0370634-12 INFO water.default: POST /4/Predictions/models/GLM_model_python_1619215877619_17/frames/in.hex, parms: {} 04-23 15:28:58.216 192.168.86.41:54321 6685 FJ-2-23 ERROR water.default: DistributedException from /192.168.86.41:54321: '63', caused by java.lang.ArrayIndexOutOfBoundsException: 63 at water.MRTask.getResult(MRTask.java:654) at water.MRTask.getResult(MRTask.java:664) at water.MRTask.doAll(MRTask.java:524) at water.MRTask.doAll(MRTask.java:406) at water.MRTask.doAll(MRTask.java:391) at water.fvec.RollupStats$ComputeRollupsTask.compute2(RollupStats.java:483) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1603) at water.fvec.RollupStats$ComputeRollupsTask$Icer.compute1(RollupStats$ComputeRollupsTask$Icer.java) at water.H2O$H2OCountedCompleter.compute(H2O.java:1599) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Caused by: java.lang.ArrayIndexOutOfBoundsException: 63 at water.fvec.CategoricalWrappedVec$CategoricalWrappedChunk.at8_impl(CategoricalWrappedVec.java:207) at water.fvec.CategoricalWrappedVec$CategoricalWrappedChunk.atd_impl(CategoricalWrappedVec.java:200) at water.fvec.Chunk.atd(Chunk.java:260) at water.fvec.RollupStatsHelpers.numericChunkRollup(RollupStatsHelpers.java:41) at water.fvec.RollupStats.map(RollupStats.java:197) at water.fvec.RollupStats.access$100(RollupStats.java:30) at water.fvec.RollupStats$Roll.map(RollupStats.java:273) at water.MRTask.compute2(MRTask.java:817) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1603) at water.fvec.RollupStats$Roll$Icer.compute1(RollupStats$Roll$Icer.java) ... 6 more Remote rollups failed with an exception, wrapping and rethrowing: DistributedException from /192.168.86.41:54321: '63', caused by java.lang.ArrayIndexOutOfBoundsException: 63 04-23 15:28:58.232 192.168.86.41:54321 6685 FJ-1-3 ERROR water.default: java.lang.RuntimeException: DistributedException from /192.168.86.41:54321: '63', caused by java.lang.ArrayIndexOutOfBoundsException: 63 at water.fvec.RollupStats.get(RollupStats.java:362) at water.fvec.RollupStats.get(RollupStats.java:371) at water.fvec.Vec.rollupStats(Vec.java:911) at water.fvec.Vec.sparseRatio(Vec.java:226) at water.util.FrameUtils.sparseRatio(FrameUtils.java:307) at hex.glm.GLMScore.(GLMScore.java:41) at hex.glm.GLMModel.makeScoringTask(GLMModel.java:1854) at hex.glm.GLMModel.predictScoreImpl(GLMModel.java:1869) at hex.Model.score(Model.java:1708) at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:410) at water.H2O$H2OCountedCompleter.compute(H2O.java:1600) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Caused by: DistributedException from /192.168.86.41:54321: '63', caused by java.lang.ArrayIndexOutOfBoundsException: 63 at water.RPC.result(RPC.java:240) at water.RPC.get(RPC.java:281) at water.RPC.get(RPC.java:255) at water.fvec.RollupStats.get(RollupStats.java:356) ... 15 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 63 at water.fvec.CategoricalWrappedVec$CategoricalWrappedChunk.at8_impl(CategoricalWrappedVec.java:207) at water.fvec.CategoricalWrappedVec$CategoricalWrappedChunk.atd_impl(CategoricalWrappedVec.java:200) at water.fvec.Chunk.atd(Chunk.java:260) at water.fvec.RollupStatsHelpers.numericChunkRollup(RollupStatsHelpers.java:41) at water.fvec.RollupStats.map(RollupStats.java:197) at water.fvec.RollupStats.access$100(RollupStats.java:30) at water.fvec.RollupStats$Roll.map(RollupStats.java:273) at water.MRTask.compute2(MRTask.java:817) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1603) at water.fvec.RollupStats$Roll$Icer.compute1(RollupStats$Roll$Icer.java) at water.H2O$H2OCountedCompleter.compute(H2O.java:1599) ... 5 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 63

Caused by: DistributedException from /192.168.86.41:54321: '63', caused by java.lang.ArrayIndexOutOfBoundsException: 63

Caused by: java.lang.ArrayIndexOutOfBoundsException: 63

04-23 15:28:58.456 192.168.86.41:54321 6685 370634-373 INFO water.default: DELETE /4/sessions/_sid_8f87, parms: {}

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8122 Assignee: New H2O Bugs Reporter: Wendy State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A