Open exalate-issue-sync[bot] opened 1 year ago
Nidhi Mehta commented: #93805 (https://support.h2o.ai/a/tickets/93805) - Re: H2O proxy issue
JIRA Issue Migration Info
Jira Issue: PUBDEV-6299 Assignee: Michal Kurka Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
If there are different categorical levels in the train and validation dataset, and a user sets categorical_encoding to "one_hot_explicit" in XGBoost they will see a
ERRR: java.lang.AssertionError
, and the assertion that gets violated is here:{code} assert (!expensive || _valid==null || Arrays.equals(_train._names, _valid._names) || _parms._categorical_encoding == Model.Parameters.CategoricalEncodingScheme.Binary); {code}
code to reproduce the issue can be found here: {code} import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator h2o.init()
cars = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv")
convert response column to a factor
cars["economy_20mpg"] = cars["economy_20mpg"].asfactor()
set the predictor names and the response column name
predictors = cars.columns response = "economy_20mpg" cars.impute()
split into train and validation sets
hf_train, hf_test = cars.split_frame(ratios = [.8], seed = 1234)
create a new level in the train frame but not the validation frame
hf_train['name'] = hf_train['name'].ascharacter() hf_train[1:30, 'name'] = ('lauren' ) hf_train['name'] = hf_train['name'].asfactor()
param = { "ntrees" : 500 , "max_depth" : 10 , "learn_rate" : 0.1 , "sample_rate" : 1.0 , "col_sample_rate_per_tree" : 1.0 , "min_rows" : 5 , "seed": 4241 , "score_tree_interval": 100 , "categorical_encoding": "one_hot_explicit" } from h2o.estimators import H2OXGBoostEstimator model = H2OXGBoostEstimator(**param) model.train(x = predictors, y = response, training_frame = hf_train, validation_frame = hf_test)
{code}
Note the if you remove the line
"categorical_encoding": "one_hot_explicit"
and use the default encoding xgboost runs just fine.The stack trace {code} xgboost Model Build progress: | (failed)
OSError Traceback (most recent call last)