triton-inference-server / fil_backend

FIL backend for the Triton Inference Server
Apache License 2.0
68 stars 35 forks source link

Models serialized as JSON with XGBoost >=1.7 fail to load #311

Closed kiarash-rezahanjani closed 1 year ago

kiarash-rezahanjani commented 1 year ago

Hi,

I am trying to load and serve an XGBoost (regressor) model serialized as JSON and I am receiving the following error when starting the server:

I1128 10:29:44.041470 1 model_config_utils.cc:1840]     ModelConfig::sequence_batching::state::initial_state::dims
I1128 10:29:44.041695 1 model_config_utils.cc:1840]     ModelConfig::version_policy::specific::versions
I1128 10:29:44.042676 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: xgboost_regressor_0 (CPU device 0)
I1128 10:29:44.042830 1 backend_model_instance.cc:68] Creating instance xgboost_regressor_0 on CPU using artifact ''
I1128 10:29:44.054646 1 model_finalize.hpp:36] TRITONBACKEND_ModelFinalize: delete model state
E1128 10:29:44.054714 1 model_lifecycle.cc:597] failed to load 'xgboost_regressor' version 1: Unknown: [10:29:44] /rapids_triton/build/_deps/treelite-src/src/frontend/xgboost_json.cc:539: Provided JSON could not be parsed as XGBoost model. Parsing error at offset 105127: Terminate parsing due to Handler error.
ram":{"base_score":"5E-1","boost_from_average":"1","num_class":"0","num_feature":"2","num_target":"1

Here is the setup used:

Running tritonserver docker version==22.11 locally on macOS

Directory structure:

├── _reproduce_bugs
│   └── xgboost_regressor
│       ├── 1
│       │   └── xgboost.json
│       └── config.pbtxt

config.pbtxt content:

name: "xgboost_regressor"
backend: "fil"
max_batch_size: 1
input [
  {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ 2 ]
  }
]
output [
  {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1 ]
  }
]
instance_group [{ kind: KIND_CPU }]
parameters [
  {
    key: "model_type"
    value: { string_value: "xgboost_json" }
  },
  {
    key: "output_class"
    value: { string_value: "false" }
  },
  {
    key: "storage_type"
    value: { string_value: "AUTO" }
  }
]
dynamic_batching { }

The model is serialized using xgboost==1.7.1 and was verified as follows:

import xgboost as xgb
xgb_reg = xgb.XGBRegressor(n_estimators=20)
xgb_reg.fit(train_features, train_labels)
xgb_reg.save_model('./xgboost.json')

xgb_reg_loaded = xgb.XGBRegressor()
xgb_reg_loaded.load_model('./xgboost.json')
xgb_reg_loaded.predict(features_3_rows_df)

Note:

I am curious if I am missing something in my setup or if this could be a bug ? I would be happy to share the model artifact or the code to generate it.

Thanks

kiarash-rezahanjani commented 1 year ago

Looking through similar issues, found the following which might be related but I am not entirely sure atm: https://github.com/rapidsai/cuml/issues/4715 -> https://github.com/rapidsai/cuml/pull/4752

wphicks commented 1 year ago

Thanks for reporting this, @kiarash-rezahanjani! XGBoost changed its JSON format in version 1.7, and we have not yet updated to a Treelite version which supports it. We should be able to get that updated in version 23.01. I'm going to update the title of this issue as a reminder to myself to make sure the update goes through and to update documentation accordingly.

wphicks commented 1 year ago

Oh, and I almost forgot: As a workaround, you should still be able to use XGBoost's binary format for serialization for now. Treelite should have an update shortly that will make it more robust to changes in XGBoost's JSON schema, so this hopefully won't crop up again too often.

kiarash-rezahanjani commented 1 year ago

Great! Thanks for the quick update @wphicks

tylerhutcherson commented 1 year ago

Just adding a +1 here as I ran into the exact same thing: xgboost==1.7.1 with tritonserver docker version at 22.08. I'll use the binary format as a serialization workaround for now.

wphicks commented 1 year ago

Thanks very much for your patience on this @kiarash-rezahanjani and @tylerhutcherson. The RAPIDS 22.12 release is not quite out yet, but I went ahead and created #314 so we can build and test with it as soon as it's available. Once that PR passes CI, I'll point you to instructions for building locally in case you need this right away. Otherwise, the fix should appear in the 23.01 Triton release.