h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 1.99k forks source link

h2o.gbm fails when `monotone_constraints`, `distribution = "quantile"` and `nfolds` are used togather #7084

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{code:r}library("h2o") packageVersion("h2o") h2o.init()

iris_h2o = as.h2o(iris)

this fails

h2o.gbm( y = "Sepal.Length" , training_frame = iris_h2o , monotone_constraints = list("Sepal.Width" = 1) , nfolds = 5 , distribution = "quantile" , quantile_alpha = 0.8 )

woks when nfolds is not specified

h2o.gbm( y = "Sepal.Length" , training_frame = iris_h2o , monotone_constraints = list("Sepal.Width" = 1) , distribution = "quantile" , quantile_alpha = 0.8 ){code}

exalate-issue-sync[bot] commented 1 year ago

Veronika Maurerová commented: Hi [~accountid:557058:887efb87-dcc7-4af2-97d6-aaf5038ef621] , can you please share more information - the error log and version of h2o? I tried your code and it does not fail with the latest version of h2o. Thanks!

exalate-issue-sync[bot] commented 1 year ago

Srikanth K S commented: [~accountid:5bd237b8dd3cc64b77e71676]

{code:r}library("h2o")

>

> ----------------------------------------------------------------------

>

> Your next step is to start H2O:

> > h2o.init()

>

> For H2O package documentation, ask for help:

> > ??h2o

>

> After starting H2O, you can use the Web UI at http://localhost:54321

> For more information visit https://docs.h2o.ai

>

> ----------------------------------------------------------------------

>

> Attaching package: 'h2o'

> The following objects are masked from 'package:stats':

>

> cor, sd, var

> The following objects are masked from 'package:base':

>

> &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,

> colnames<-, ifelse, is.character, is.factor, is.numeric, log,

> log10, log1p, log2, round, signif, trunc

packageVersion("h2o")

> [1] '3.36.0.2'

h2o.init()

> Connection successful!

>

> R is connected to the H2O cluster:

> H2O cluster uptime: 11 hours 18 minutes

> H2O cluster timezone: Asia/Kolkata

> H2O data parsing timezone: UTC

> H2O cluster version: 3.36.0.2

> H2O cluster version age: 22 days

> H2O cluster name: H2O_started_from_R_s0k06e8_yxx067

> H2O cluster total nodes: 1

> H2O cluster total memory: 2.92 GB

> H2O cluster total cores: 16

> H2O cluster allowed cores: 16

> H2O cluster healthy: TRUE

> H2O Connection ip: localhost

> H2O Connection port: 54321

> H2O Connection proxy: NA

> H2O Internal Security: FALSE

> H2O API Extensions: Amazon S3, XGBoost, Algos, Infogram, AutoML, Core V3, TargetEncoder, Core V4

> R Version: R version 4.1.2 (2021-11-01)

iris_h2o = as.h2o(iris)

> | | | 0% | |======================================================================| 100%

this fails

h2o.gbm( y = "Sepal.Length" , training_frame = iris_h2o , monotone_constraints = list("Sepal.Width" = 1) , nfolds = 5 , distribution = "quantile" , quantile_alpha = 0.8 )

> | | | 0%

>

> java.lang.AssertionError

>

> java.lang.AssertionError

> at hex.tree.DTree$LeafNode.compress(DTree.java:928)

> at hex.tree.DTree$DecidedNode.compress(DTree.java:903)

> at hex.tree.DTree$DecidedNode.compress(DTree.java:903)

> at hex.tree.DTree$DecidedNode.compress(DTree.java:903)

> at hex.tree.DTree.compress(DTree.java:954)

> at hex.tree.SharedTreeModel$SharedTreeOutput.addKTrees(SharedTreeModel.java:232)

> at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees(GBM.java:548)

> at hex.tree.SharedTree$Driver.scoreAndBuildTrees(SharedTree.java:479)

> at hex.tree.SharedTree$Driver.computeImpl(SharedTree.java:378)

> at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:247)

> at water.H2O$H2OCountedCompleter.compute(H2O.java:1658)

> at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)

> at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)

> at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)

> at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)

> at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

> Error: java.lang.AssertionError

woks when nfolds is not specified

h2o.gbm( y = "Sepal.Length" , training_frame = iris_h2o , monotone_constraints = list("Sepal.Width" = 1) , distribution = "quantile" , quantile_alpha = 0.8 )

> | | | 0% | |======= | 10% | |======================================================================| 100%

> Model Details:

> ==============

>

> H2ORegressionModel: gbm

> Model ID: GBM_model_R_1645035709511_23

> Model Summary:

> number_of_trees number_of_internal_trees model_size_in_bytes min_depth

> 1 50 50 7177 1

> max_depth mean_depth min_leaves max_leaves mean_leaves

> 1 5 3.80000 2 12 6.76000

>

>

> H2ORegressionMetrics: gbm

> Reported on training data.

>

> MSE: 2.834132

> RMSE: 1.683488

> MAE: 1.569319

> RMSLE: 0.2230081

> Mean Residual Deviance : 0.3222256

sessionInfo()

> R version 4.1.2 (2021-11-01)

> Platform: x86_64-apple-darwin17.0 (64-bit)

> Running under: macOS Catalina 10.15.7

>

> Matrix products: default

> BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib

> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

>

> locale:

> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

>

> attached base packages:

> [1] stats graphics grDevices utils datasets methods base

>

> other attached packages:

> [1] h2o_3.36.0.2

>

> loaded via a namespace (and not attached):

> [1] pillar_1.6.4 compiler_4.1.2 highr_0.9 R.methodsS3_1.8.1

> [5] R.utils_2.11.0 bitops_1.0-7 tools_4.1.2 digest_0.6.29

> [9] bit_4.0.4 jsonlite_1.7.2 evaluate_0.14 lifecycle_1.0.1

> [13] tibble_3.1.6 R.cache_0.15.0 pkgconfig_2.0.3 rlang_0.4.12

> [17] reprex_2.0.1 yaml_2.2.1 xfun_0.29 fastmap_1.1.0

> [21] withr_2.4.3 styler_1.6.2 stringr_1.4.0 knitr_1.37

> [25] fs_1.5.2 vctrs_0.3.8 bit64_4.0.5 glue_1.5.1

> [29] data.table_1.14.2 fansi_0.5.0 rmarkdown_2.11 purrr_0.3.4

> [33] magrittr_2.0.1 backports_1.4.1 ellipsis_0.3.2 htmltools_0.5.2

> [37] utf8_1.2.2 stringi_1.7.6 RCurl_1.98-1.5 crayon_1.4.2

> [41] R.oo_1.24.0{code}

Created on 2022-02-17 by the [reprex package|https://reprex.tidyverse.org] (v2.0.1)

exalate-issue-sync[bot] commented 1 year ago

Veronika Maurerová commented: Thanks [~accountid:557058:887efb87-dcc7-4af2-97d6-aaf5038ef621] I have running the same version and I am still not able to reproduce it. But from the error log it looks there is an NaN value in LeafNode prediction which is strange. I Will continue investigate this bug. Thank you for reporting it.

exalate-issue-sync[bot] commented 1 year ago

Veronika Maurerová commented: Hi [~accountid:557058:887efb87-dcc7-4af2-97d6-aaf5038ef621], I finally reproduced the bug and I am working on a fix now. Thanks, you let us know.

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8581 Assignee: Veronika Maurerová Reporter: Srikanth K S State: Resolved Fix Version: 3.36.0.4 Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/6091