h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

h2o.cut not separating by breaks correctly #8372

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

In the example below, I want to create a bin for each decile of the model prediction. I found that the first several deciles are grouped together in the bin: (0.0,0.001]

{code:R} df <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/demos/bank-additional-full.csv") xgboost <- h2o.xgboost(y = "y", training_frame = df, max_depth = 1, ntrees = 1)

preds <- h2o.predict(xgboost, df) breaks <- h2o.quantile(preds$yes, probs = seq(0, 1, 0.1)) bins <- h2o.cut(preds$yes, breaks)

h2o.table(bins) yes Count 1 (0.0,0.001] 14099 2 (0.001,0.005] 1840 3 (0.005,0.12] 2061 4 (0.12,0.965] 1999 {code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7263 Assignee: New H2O Bugs Reporter: Megan Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A