h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

runit_GBM_groupsplit_czechboard_medium.R gets bad HTTP status on ModelMetrics.json #13347

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

I made the test NOPASS after this (happens repeatedly)

http://mr-0xe4:8080/job/h2o_master_DEV_runit_medium_large/1022/console http://mr-0xe4:8080/job/h2o_master_DEV_runit_medium_large/1022/artifact/h2o-r/tests/results/testdir_algos_gbm_runit_GBM_groupsplit_czechboard_medium.R.out.txt

pasted the R output below snippet:

[2015-02-03 00:05:32] [ERROR] : Error: Test failed: 'GBM Test: Classification with Checkerboard Group Split' Not expected: Unexpected HTTP Status code: 500 Internal Server Error (url = http://127.0.0.1:44000/LATEST/ModelMetrics.json/models/GBMModel__890883e5e1781e09e3b5dddcbe194aa8/frames/board.hex) 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: withWarnings(test(conn)) 5: withCallingHandlers(expr, warning = wHandler) 6: test(conn) 7: h2o.performance(drfmodel.nogrp, board.hex) 8: .h2o.remoteSend(model@conn, method = "POST", .h2o.MODEL_METRICS(model@key, data@key), .params = parms) 9: .h2o.fromJSON(.h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method)) 10: processMatrices(fromJSON(txt, ...)) 11: fromJSON(txt, ...) 12: .h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method) 13: stop(sprintf("Unexpected HTTP Status code: %d %s (url = %s)", rv$httpStatusCode, rv$httpStatusMessage, rv$url)) 14: .handleSimpleError(function (e) { e$calls <- head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, "Unexpected HTTP Status code: 500 Internal Server Error (url = http://127.0.0.1:44000/LATEST/ModelMetrics.json/models/GBMModel__890883e5e1781e09e3b5dddcbe194aa8/frames/board.hex)", quote(.h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method))).


SUMMARY OF RESULTS


Total tests: 25 Passed: 10 Did not pass: 15 Did not complete: 0 Tolerated NOPASS: 14

Total time: 1584.36 sec Time/completed test: 63.37 sec

True fail list: runit_GBM_groupsplit_czechboard_medium.R

Build step 'Execute shell' marked build as failure Archiving artifacts Recording fingerprints Extended Email Publisher is currently disabled in project settings Finished: FAILURE

R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.

setwd(normalizePath(dirname(R.utils::commandArgs(asValues=TRUE)$"f"))) source('../../h2o-runit.R') Loading required package: R.oo Loading required package: R.methodsS3 R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help. R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.

Attaching package: ‘R.oo’

The following objects are masked from ‘package:methods’:

getClasses, getMethods

The following objects are masked from ‘package:base’:

attach, detach, gc, load, save

R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.

Attaching package: ‘R.utils’

The following object is masked from ‘package:utils’:

timestamp

The following objects are masked from ‘package:base’:

cat, commandArgs, getOption, inherits, isOpen, parse, warnings

[2015-02-03 00:04:58] [INFO]: ============== Setting up R-Unit environment... ================

[2015-02-03 00:04:58] [INFO]: Check that H2O R package matches version on server

Loading required package: RCurl Loading required package: bitops

Attaching package: ‘RCurl’

The following object is masked from ‘package:R.utils’:

reset

The following object is masked from ‘package:R.oo’:

clone

Loading required package: rjson Loading required package: tools Loading required package: statmod


Your next step is to start H2O and get a connection object (named 'localH2O', for example):

localH2O = h2o.init()

For H2O package documentation, ask for help:

??h2o

After starting H2O, you can use the Web UI at http://localhost:54321 For more information visit http://docs.0xdata.com


Successfully connected to http://127.0.0.1:44000/

R is connected to H2O cluster: H2O cluster uptime: 22 minutes 50 seconds H2O cluster version: 0.1.23.99999 H2O cluster name: H2O_runit_jenkins_9296245 H2O cluster total nodes: 5 H2O cluster total memory: 88.89 GB H2O cluster total cores: 120 H2O cluster allowed cores: 120 H2O cluster healthy: TRUE

[2015-02-03 00:05:00] [INFO]: Checking Package dependencies for this test.

[2015-02-03 00:05:00] [INFO]: Loading RUnit and testthat and R.utils

Loading required package: RUnit Loading required package: testthat

Attaching package: ‘testthat’

The following object is masked from ‘package:R.oo’:

equals

[2015-02-03 00:05:00] [INFO]: Loading other required test packages Loading required package: glmnet Loading required package: Matrix Loaded glmnet 1.9-8

Loading required package: gbm Loading required package: survival Loading required package: splines Loading required package: lattice Loading required package: parallel Loaded gbm 2.1 Loading required package: ROCR Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

[INFO]: Using SEED: 1959759076

[SEED] : 1959759076

Appending REST API transactions to log file ./Rsandbox_runit_GBM_groupsplit_czechboard_medium.R/rest.log

test.GBM.Czechboard <- function(conn) {

  • Training set has checkerboard pattern

  • Log.info("Importing czechboard_300x300.csv data...\n")
  • board.hex <- h2o.uploadFile(conn, locate("smalldata/gbm_test/czechboard_300x300.csv"), key = "board.hex")
  • board.hex[,3] <- as.factor(board.hex[,3])
  • Log.info("Summary of czechboard_300x300.csv from H2O:\n")
  • print(summary(board.hex))
  • Train H2O GBM Model:

  • Log.info("H2O GBM (Naive Split) with parameters:\nntrees = 50, max_depth = 20, nbins = 500\n")
  • drfmodel.nogrp <- h2o.gbm(x = c("C1", "C2"), y = "C3", training_frame = board.hex, ntrees = 50, max_depth = 20, nbins = 500, group_split = FALSE)
  • print(drfmodel.nogrp)
  • drfmodel.nogrp.perf <- h2o.performance(drfmodel.nogrp, board.hex)
  • Log.info("H2O GBM (Group Split) with parameters:\nntrees = 50, max_depth = 20, nbins = 500\n")
  • drfmodel.grpsplit <- h2o.gbm(x = c("C1", "C2"), y = "C3", training_frame = board.hex, ntrees = 50, max_depth = 20, nbins = 500, group_split = TRUE)
  • print(drfmodel.grpsplit)
  • drfmodel.grpsplit.perf <- h2o.performance(drfmodel.grpsplit, board.hex)
  • expect_true(drfmodel.grpsplit.perf@metrics$auc$AUC >= drfmodel.nogrp.perf@metrics$auc$AUC)
  • expect_true(drfmodel.grpsplit.perf@metrics$cm$table[3,3] <= drfmodel.nogrp.perf@metrics$cm$table[3,3])
  • testEnd()
  • }

doTest("GBM Test: Classification with Checkerboard Group Split", test.GBM.Czechboard) [2015-02-03 00:05:05] [INFO]: ======================== Begin Test ===========================

[2015-02-03 00:05:05] [INFO]: Importing czechboard_300x300.csv data...

0%
====================================================================== 100%

[2015-02-03 00:05:07] [INFO]: Summary of czechboard_300x300.csv from H2O:

C1 C2 C3
a:300 a:300 0 :45000 b:300 b:300 1 :45000 c:300 c:300
d:300 d:300
e:300 e:300
f:300 f:300
[2015-02-03 00:05:08] [INFO]: H2O GBM (Naive Split) with parameters: ntrees = 50, max_depth = 20, nbins = 500

0%
= 2%
====== 8%
======== 12%
============= 18%
================= 24%
==================== 28%
======================== 34%
============================ 40%
================================ 46%
=================================== 50%
======================================= 56%
=========================================== 62%
================================================ 68%
==================================================== 74%
======================================================== 80%
============================================================ 86%
================================================================ 92%
=================================================================== 96%
====================================================================== 100%

H2OBinomialModel: gbm

Model Details:

Mean Square Error for Train Frame [1] 0.2500000 0.2493672 0.2488946 0.2486529 0.2484250 0.2482345 0.2480668 [8] 0.2478935 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.2456820 0.0000000 [22] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 [29] 0.0000000 0.0000000 0.0000000 0.2433153 0.0000000 0.0000000 0.0000000 [36] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 [43] 0.0000000 0.2411938 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 [50] 0.0000000 0.2400036

Mean Square Error for Validation Frame [,1] [1,] "NaN" [2,] "NaN" [3,] "NaN" [4,] "NaN" [5,] "NaN" [6,] "NaN" [7,] "NaN" [8,] "NaN" [9,] "0"
[10,] "0"
[11,] "0"
[12,] "0"
[13,] "0"
[14,] "0"
[15,] "0"
[16,] "0"
[17,] "0"
[18,] "0"
[19,] "0"
[20,] "NaN" [21,] "0"
[22,] "0"
[23,] "0"
[24,] "0"
[25,] "0"
[26,] "0"
[27,] "0"
[28,] "0"
[29,] "0"
[30,] "0"
[31,] "0"
[32,] "NaN" [33,] "0"
[34,] "0"
[35,] "0"
[36,] "0"
[37,] "0"
[38,] "0"
[39,] "0"
[40,] "0"
[41,] "0"
[42,] "0"
[43,] "0"
[44,] "NaN" [45,] "0"
[46,] "0"
[47,] "0"
[48,] "0"
[49,] "0"
[50,] "0"
[51,] "NaN"

######## ### #### ##

[2015-02-03 00:05:32] [ERROR] : Error: Test failed: 'GBM Test: Classification with Checkerboard Group Split' Not expected: Unexpected HTTP Status code: 500 Internal Server Error (url = http://127.0.0.1:44000/LATEST/ModelMetrics.json/models/GBMModel__890883e5e1781e09e3b5dddcbe194aa8/frames/board.hex) 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: withWarnings(test(conn)) 5: withCallingHandlers(expr, warning = wHandler) 6: test(conn) 7: h2o.performance(drfmodel.nogrp, board.hex) 8: .h2o.remoteSend(model@conn, method = "POST", .h2o.MODEL_METRICS(model@key, data@key), .params = parms) 9: .h2o.fromJSON(.h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method)) 10: processMatrices(fromJSON(txt, ...)) 11: fromJSON(txt, ...) 12: .h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method) 13: stop(sprintf("Unexpected HTTP Status code: %d %s (url = %s)", rv$httpStatusCode, rv$httpStatusMessage, rv$url)) 14: .handleSimpleError(function (e) { e$calls <- head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, "Unexpected HTTP Status code: 500 Internal Server Error (url = http://127.0.0.1:44000/LATEST/ModelMetrics.json/models/GBMModel__890883e5e1781e09e3b5dddcbe194aa8/frames/board.hex)", quote(.h2o.doSafeREST(conn = conn, urlSuffix = page, parms = .params, method = method))).

SEED used: 1959759076

[2015-02-03 00:05:32] [ERROR] : TEST FAILED No traceback available

exalate-issue-sync[bot] commented 1 year ago

Kevin Normoyle commented: I'm making this a nopass test tonite.

runit_GBM_groupsplit_czechboard_medium.R

there's enough other confusion in moving to new jenkins jobs for high throughput that we need all green, too hard to debug other infrastructure things if the tests fail intermittently

don't know if this was introduced today or what .(I think it was introduced earlier today)

can change it back tomorrow if you like

I noticed I did get a pass, then it failed with no pushes

so maybe it's something intermittent. Fails a lot now. (all the time?)

the other tests in the job don't

it's running on a machine with 128GB dram with a single cloud of 5 jvms of 20GB eac

Wiping output directory...

Wiping test state (including random seeds)...

Wiping output directory...

Wiping test state (including random seeds)...

Starting clouds...

Waiting for H2O nodes to come up...

H2O Cloud 0 Node 0 started with output file /home4/jenkins/slave_dir_from_mr-0xe4/workspace/h2o_master_DEV_runit_medium_large/h2o-r/tests/results/java_0_0.out.txt H2O Cloud 0 Node 1 started with output file /home4/jenkins/slave_dir_from_mr-0xe4/workspace/h2o_master_DEV_runit_medium_large/h2o-r/tests/results/java_0_1.out.txt H2O Cloud 0 Node 2 started with output file /home4/jenkins/slave_dir_from_mr-0xe4/workspace/h2o_master_DEV_runit_medium_large/h2o-r/tests/results/java_0_2.out.txt H2O Cloud 0 Node 3 started with output file /home4/jenkins/slave_dir_from_mr-0xe4/workspace/h2o_master_DEV_runit_medium_large/h2o-r/tests/results/java_0_3.out.txt H2O Cloud 0 Node 4 started with output file /home4/jenkins/slave_dir_from_mr-0xe4/workspace/h2o_master_DEV_runit_medium_large/h2o-r/tests/results/java_0_4.out.txt

Setting up R H2O package...

Starting 25 tests on 1 clouds with 5 total H2O nodes...

exalate-issue-sync[bot] commented 1 year ago

Arno Candel commented: Can you please check this again?

exalate-issue-sync[bot] commented 1 year ago

Sebastian Czarnota commented: This test was removed, adding it back in. It now passes

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-347 Assignee: UNASSIGNED Reporter: Kevin Normoyle State: Resolved Fix Version: N/A Attachments: N/A Development PRs: N/A