Open MaajidKhan opened 3 years ago
thanks for reporting. will investigate.
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Components ->This analysis was w.r.t OpenVINO Execution Provider vs MLAS -> Model used was mlperf_ssd_mobilenet_300 onnx model -> Tool used to run the model: onnxruntime_perf_test -> Platform: Linux (ubuntu)
Describe the bug When I tried running this model with OpenVINO-EP. This model is not fully supported on openvino-ep for CPU device and so 0 subgraphs can run on OpenVINO-EP. Now, the whole model falls back completely to default MLAS - CPU Execution Provider.
what I observe is, there is a significant performance difference when this model is ran using OpenVINO-EP vs MLAS. Point to keep in mind is that even in OpenVINO-EP case, the model is still running with MLAS as the whole model was falling back to default MLAS.
default CPU: (OnnxRuntime Optimizations are enabled by default) Total inference requests: 100 Total inference run time: 1.43431 s
command: ./onnxruntime_perf_test -r 100 -c 1 -e cpu mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx
OV-EP CPU_FP32 Route: (OnnxRuntime Optimizations are enabled by default) Total inference requests: 100 Total inference run time: 3.38388 s
command: ./onnxruntime_perf_test -r 100 -c 1 -e openvino mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx
The reason why there is difference in performance timings though both were eventually running on MLAS is because of the early extra optimizations that are added when a model is run directly by MLAS.
case 1: You run a model directly using MLAS (CPU Execution Provider). Before going to inference stage, it introduces early ONNX optimizations like reshape_fusion and more. This influences the performance numbers during the final inference for the same 100 iterations we use.
case 2: You try running a model using OpenVINO-EP CPU. Then in get_capability stage, it realizes, it can't support any subgraphs on OV-EP. so the whole model has to fall back to default CPU provider (MLAS). It does fall back and is executed by MLAS. But here those early optimizations are not introduced over the model network from ONNXRuntime. since it came through OV-EP route and then had to fall back to default CPU route.
Refer the 2 logs attached in the ticket to prove the above analysis.
I also tried running this model by disabling onnxruntime optimizations which are usually enabled by default. Then I see similar performance in both cases.
OV-EP CPU_FP32 Route: (Disabling onnxruntime optimizations) Total inference requests: 100 Total inference run time: 3.86726 s
command: ./onnxruntime_perf_test -r 100 -c 1 -o 0 -e openvino mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx
default CPU: (Disabling onnxruntime optimizations) Total inference requests: 100 Total inference run time: 3.82465 s
command: ./onnxruntime_perf_test -r 100 -c 1 -o 0 -e cpu mlperf_ssd_mobilenet_300/ssd_mobilenet_v1_coco_2018_01_28.onnx
so, just wanted to understand. For the top scenario, where we are seeing significant difference in performance with optimizations enabled. If it's a bug or is it something expected? In the second scenario, with onnxruntime optimizations disabled, the performance is same in both cases (which is understood)
Urgency Low/Medium
System information Distributor ID: Ubuntu Description: Ubuntu 18.04.5 LTS Release: 18.04 Codename: bionic Model name: Intel(R) Core(TM) i9-9960X CPU @ 3.10GHz Python 3.6.9 cmake 3.20.3 gcc 7.5
To Reproduce
Screenshots cpu_op10_mlperf_ssd_mobilenet_300_log.txt
ov_ep_op10_mlperf_ssd_mobilenet_300_log.txt