microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.35k stars 2.88k forks source link

[Performance] Very slow load of ONNX model in Windows #22219

Open dhatraknilam opened 2 weeks ago

dhatraknilam commented 2 weeks ago

Describe the issue

I am trying to load XGBoost onnx models using onnxruntime on Windows machine. The model size is 52 mb and the RAM it is consuming on loading is 1378.9 MB. The time to load the model is 15 mins!! This behavior is observed only on Windows, in Linux the models are loaded in few seconds. but the memory consumption is high in Linux as well.

I tried solution suggested in [https://github.com/microsoft/onnxruntime/issues/3802#issuecomment-624464802] but getting this error AttributeError: 'onnxruntime.capi.onnxruntime_pybind11_state.SessionOptions' object attribute 'graph_optimization_level' is read-only

This is the simple code I used to load the model, # sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"])

To reproduce

Train and a XGBoost classification model following params: `

Classifier

update_registered_converter( XGBClassifier, "XGBoostXGBClassifier", calculate_linear_classifier_output_shapes, convert_xgboost, options={"nocl": [True, False], "zipmap": [True, False, "columns"]}, )

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1]) x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputClassifier(XGBClassifier(**param)))]) pipe.fit(x_train.to_numpy(), y_train) model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, x_train.shape[1]]))], verbose=1, target_opset={"": 12, "ai.onnx.ml": 2}, )

with open("modelname.onnx", "wb") as f: f.write(model_onnx.SerializeToString()) `

Train and a XGBoost regressor model following params: `

Regressor

update_registered_converter( XGBRegressor, "XGBoostXGBRegressor", calculate_linear_regressor_output_shapes, convert_xgboost,

)

param = {'n_estimators': 3435, 'max_delta_step': 6, 'learning_rate': 0.030567232354470994, 'base_score': 0.700889637773676, 'scale_pos_weight': 0.29833333651319716, 'booster': 'gbtree', 'reg_lambda': 0.0005531812782988272, 'reg_alpha': 4.8213852607021606e-05, 'subsample': 0.9816268623744107, 'colsample_bytree': 0.3187040821569215, 'max_depth': 17, 'min_child_weight': 2, 'eta': 6.2582977222245746e-06, 'gamma': 2.2248460288603035e-07, 'grow_policy': 'depthwise'}

x_train.columns = range(x_train.shape[1]) x_test.columns = range(x_train.shape[1])

pipe = Pipeline([("xgb", MultiOutputRegressor(XGBRegressor(**param)))]) pipe.fit(x_train.to_numpy(), y_train)

model_onnx = convert_sklearn( pipe, "pipeline_xgboost", [("input", FloatTensorType([None, x_train.shape[1]]))], verbose=1, target_opset={"": 12, "ai.onnx.ml": 2}, options={type(pipe):{'zipmap':False}} )

with open("modelname.onnx", "wb") as f: f.write(model_onnx.SerializeToString())`

Load the model with following code, sess = rt.InferenceSession(modelSav_path, providers=["CPUExecutionProvider"]) And observe the load time and RAM usage.

Urgency

This is release critical issue, since we can't deliver these models with such low performance. Although the models are performing well, we are stuck with the loading time issue. We also thought to use other libraries to package the ML models but we don't have necessary compliance also we trust Microsoft.

Platform

Windows

OS Version

11

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

xadupre commented 1 week ago

This PR should solve this: https://github.com/microsoft/onnxruntime/pull/22043.

dhatraknilam commented 1 week ago

This PR should solve this: #22043.

Thanks #xadupre for the prompt response will try it and update here.