[Performance] Performance degradation observed w.r.t DNNL-EP in v1.15.1 compared to v1.13.1

gitgani commented 1 year ago

Describe the issue

Performance degradation observed w.r.t DNNL-EP in v1.15.1 compared to v1.13.1

To reproduce

Prerequisites:

Create a new conda environment with Python 3.8
Download onnxruntime (https://github.com/microsoft/onnxruntime/tree/v1.13.1)
Build v1.13.1 tag with --use_dnnl option
Test sample resnet50v1.onnx
Create a new conda environment with Python 3.8
Download onnxruntime (https://github.com/microsoft/onnxruntime/tree/v1.15.1)
Build v1.15.1 tag with --use_dnnl option
Test sample resnet50v1.onnx

On our x_86 machine/platform we see performance degradation of upto 100+% for inferencing in stream mode/latency mode. Inference per sample for Resnet50 on Imagenet dataset increased from 14.1ms to 28.3ms after moving from v1.13.1 to v1.15.1

OMP details: LLVM OpenMP: openmp-10.0.1 (Built and loaded : https://prereleases.llvm.org/10.0.1/) export GOMP_CPU_AFFINITY=0-63 export OMP_NUM_THREADS=64 export OMP_WAIT_POLICY=ACTIVE export OMP_PROC_BIND=FALSE export OMP_DYNAMIC=FALSE

Urgency

Performance regression observed and it is blocking to go ahead with the v1.15.1

Platform

Linux

OS Version

Ubuntu 20.04.5 LTS

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

https://github.com/microsoft/onnxruntime/tree/v1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

oneDNN; DNNL-EP

Execution Provider Library Version

oneDNN v3.0

Model File

https://github.com/onnx/models/blob/main/vision/classification/resnet/model/resnet50-v1-12.onnx

Is this a quantized model?

No

xadupre commented 1 year ago

It may be an issue with the optimizers doing different things for this model. Did you compare the optimized models ? (https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.optimized_model_filepath).

gitgani commented 1 year ago

I believe the perf drop is something to do with this particular build flag (DNNL_OPENMP). After disabling the flag and building DNNL-EP, the performance is back to original numbers(as it's in v1.13.1). Request the concerned team to please re-check and revert.

gitgani commented 1 year ago

It may be an issue with the optimizers doing different things for this model. Did you compare the optimized models ? (https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.optimized_model_filepath).

Yes, we've checked. Currently, the perf drop is across all the models(resnet50v1.onnx is just one example). The optimization flag used is GraphOptimizationLevel::ORT_DISABLE_ALL

xadupre commented 1 year ago

I assume this is due to this PR #13618.

@eralmual, could you take a look?

microsoft / onnxruntime