microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.7k stars 2.93k forks source link

Performance degradation when a model fully fall backs to MLAS from an EP #8812

Open MaajidKhan opened 3 years ago

MaajidKhan commented 3 years ago

Components ->This analysis was w.r.t OpenVINO Execution Provider vs MLAS -> Model used was mlperf_ssd_mobilenet_300 onnx model -> Tool used to run the model: onnxruntime_perf_test -> Platform: Linux (ubuntu)

Describe the bug When I tried running this model with OpenVINO-EP. This model is not fully supported on openvino-ep for CPU device and so 0 subgraphs can run on OpenVINO-EP. Now, the whole model falls back completely to default MLAS - CPU Execution Provider.

what I observe is, there is a significant performance difference when this model is ran using OpenVINO-EP vs MLAS. Point to keep in mind is that even in OpenVINO-EP case, the model is still running with MLAS as the whole model was falling back to default MLAS.

default CPU: (OnnxRuntime Optimizations are enabled by default) Total inference requests: 100 Total inference run time: 1.43431 s

command: ./onnxruntime_perf_test -r 100 -c 1 -e cpu mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx

OV-EP CPU_FP32 Route: (OnnxRuntime Optimizations are enabled by default) Total inference requests: 100 Total inference run time: 3.38388 s

command: ./onnxruntime_perf_test -r 100 -c 1 -e openvino mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx

The reason why there is difference in performance timings though both were eventually running on MLAS is because of the early extra optimizations that are added when a model is run directly by MLAS.

case 1: You run a model directly using MLAS (CPU Execution Provider). Before going to inference stage, it introduces early ONNX optimizations like reshape_fusion and more. This influences the performance numbers during the final inference for the same 100 iterations we use.

case 2: You try running a model using OpenVINO-EP CPU. Then in get_capability stage, it realizes, it can't support any subgraphs on OV-EP. so the whole model has to fall back to default CPU provider (MLAS). It does fall back and is executed by MLAS. But here those early optimizations are not introduced over the model network from ONNXRuntime. since it came through OV-EP route and then had to fall back to default CPU route.

Refer the 2 logs attached in the ticket to prove the above analysis.

I also tried running this model by disabling onnxruntime optimizations which are usually enabled by default. Then I see similar performance in both cases.

OV-EP CPU_FP32 Route: (Disabling onnxruntime optimizations) Total inference requests: 100 Total inference run time: 3.86726 s

command: ./onnxruntime_perf_test -r 100 -c 1 -o 0 -e openvino mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx

default CPU: (Disabling onnxruntime optimizations) Total inference requests: 100 Total inference run time: 3.82465 s

command: ./onnxruntime_perf_test -r 100 -c 1 -o 0 -e cpu mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx

so, just wanted to understand. For the top scenario, where we are seeing significant difference in performance with optimizations enabled. If it's a bug or is it something expected? In the second scenario, with onnxruntime optimizations disabled, the performance is same in both cases (which is understood)

Urgency Low/Medium

System information Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic Model name: Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz Python 3.6.9 cmake 3.20.3 gcc 7.5

To Reproduce

Screenshots cpu_op10_mlperf_ssd_mobilenet_300_log.txt default_cpu_vs_op_ep

ov_ep_op10_mlperf_ssd_mobilenet_300_log.txt

jywu-msft commented 3 years ago

thanks for reporting. will investigate.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.