Open dhatraknilam opened 2 months ago
This PR should solve this: https://github.com/microsoft/onnxruntime/pull/22043.
This PR should solve this: #22043.
Thanks #xadupre for the prompt response will try it and update here.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I am trying to load XGBoost onnx models using onnxruntime on Windows machine. The model size is 52 mb and the RAM it is consuming on loading is 1378.9 MB. The time to load the model is 15 mins!! This behavior is observed only on Windows, in Linux the models are loaded in few seconds. but the memory consumption is high in Linux as well.
I tried solution suggested in [https://github.com/microsoft/onnxruntime/issues/3802#issuecomment-624464802] but getting this error
AttributeError: 'onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions' object attribute 'graph_optimization_level' is read-only
This is the simple code I used to load the model,
# sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])
To reproduce
Train and a XGBoost classification model following params: `
Classifier
update_registered_converter( XGBClassifier, "XGBoostXGBClassifier", calculate_linear_classifier_output_shapes, convert_xgboost, options={"nocl": [True, False], "zipmap": [True, False, "columns"]}, )
param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}
x_train.columns = range(x_train.shape[1]) x_test.columns = range(x_train.shape[1])
pipe = Pipeline([("xgb", MultiOutputClassifier(XGBClassifier(**param)))]) pipe.fit(x_train.to_numpy(), y_train) model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, x_train.shape[1]]))], verbose=1, target_opset={"": 12, "ai.onnx.ml": 2}, )
with open("modelname.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) `
Train and a XGBoost regressor model following params: `
Regressor
update_registered_converter( XGBRegressor, "XGBoostXGBRegressor", calculate_linear_regressor_output_shapes, convert_xgboost,
)
param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}
x_train.columns = range(x_train.shape[1]) x_test.columns = range(x_train.shape[1])
pipe = Pipeline([("xgb", MultiOutputRegressor(XGBRegressor(**param)))]) pipe.fit(x_train.to_numpy(), y_train)
model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, x_train.shape[1]]))], verbose=1, target_opset={"": 12, "ai.onnx.ml": 2}, options={type(pipe):{'zipmap':False}} )
with open("modelname.onnx", "wb") as f: f.write(model_onnx.SerializeToString())`
Load the model with following code,
sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])
And observe the load time and RAM usage.Urgency
This is release critical issue, since we can't deliver these models with such low performance. Although the models are performing well, we are stuck with the loading time issue. We also thought to use other libraries to package the ML models but we don't have necessary compliance also we trust Microsoft.
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.18.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No