h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.87k stars 1.99k forks source link

XGBoost: XGBoostModel.score(munged_bnpparibas_test_data) fails #11875

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

It seems that {{XGBoost.score()}} doesn't handle unlabeled test sets. I successfully built an XGBoost model on autodl-munged BNPParibas and it crashed when I called {{model.predict()}} on the test set:

{quote}INFO: POST /4/Predictions/models/XGBoost_grid_0_AutoML_20171011_152320_model_7/frames/test_munged1.hex WARN: Test/Validation dataset is missing column 'target': substituting in a column of NaN ... OSError: Job with key $0301ac1002c634d4ffffffff$_967c15560a64477424e09eadc12a42d4 failed with an exception: java.lang.IllegalArgumentException: Domain must have 2 class labels, b\ ut is [] for binomial metrics. stacktrace: java.lang.IllegalArgumentException: Domain must have 2 class labels, but is [] for binomial metrics. at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:92) at hex.ModelMetricsBinomial.make(ModelMetricsBinomial.java:71) at hex.tree.xgboost.XGBoostModel.makePreds(XGBoostModel.java:351) at hex.tree.xgboost.XGBoostModel.makeMetrics(XGBoostModel.java:301) at hex.tree.xgboost.XGBoostModel.score(XGBoostModel.java:462){quote}

See the repro scripts in the directory specified here:

https://0xdata.atlassian.net/browse/PUBDEV-4997

Run {{single_xgboost.py}} to build the model. If it's successful, run {{xval_leaderboard.py}} to load the test set and run {{model.predict()}}.

exalate-issue-sync[bot] commented 1 year ago

Mark Landry commented: Similar behavior here.

{{ xgb<-h2o.xgboost(training_frame=splits[[1]],validation_frame = splits[[2]],x=predictors,y="target" ,sample_rate = 0.7,col_sample_rate = 0.7,learn_rate = 0.05,max_depth = 10 ,ntrees = 1000,score_tree_interval = 10,stopping_rounds = 1,stopping_tolerance = 0 ,model_id = "xgb1",seed = 18273649)

p<-h2o.predict(xgb,testHex)

Error: java.lang.IllegalArgumentException: Domain must have 2 class labels, but is [] for binomial metrics.

testHex$target<-trainHex$target[1:nrow(testHex)] ## copy arbitrary, but valid labels into prediction frame p<-h2o.predict(g,testHex) ## works correctly with proper results }}

exalate-issue-sync[bot] commented 1 year ago

trushant.kalyanpur commented: #95589 (https://support.h2o.ai/a/tickets/95589) - Can't log into H2O instance

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4999 Assignee: Michal Kurka Reporter: Raymond Peck State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A