Bad performance for QDQ model with openvino EP

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.64k stars 2.93k forks source link

Bad performance for QDQ model with openvino EP #11604

Open mengniwang95 opened 2 years ago

mengniwang95 commented 2 years ago

Describe the bug Hi, I use openvino EP to test QDQ model performance but find QDQ model's performance is worse than original fp32 model.\

System information

ONNX Runtime installed from (source or binary):binary
ONNX Runtime version:1.10.0
Python version:3.6.12
openvino 2021.4.2

mengniwang95 commented 2 years ago

I can not upload the model file... I am not sure that's why

mengniwang95 commented 2 years ago

qdq.zip this is the qdq model

yufenglee commented 2 years ago

@jywu-msft, does openvino support QDQ natively?

jywu-msft commented 2 years ago

@jywu-msft, does openvino support QDQ natively?

it is supported but maybe it's not optimized for this particular model. will raise with Intel.

jywu-msft commented 2 years ago

@sfatimar, would you be able to take a look at the qdq model?

sfatimar commented 2 years ago

How was this QDQ Model Generated ? If you want good performance you can use NNCF by Intel to generate QDQ Model which gives very good performance. https://github.com/openvinotoolkit/nncf. We can help you use Post Training Quantization , or QAT (Quantization Aware Training) features of NNCF.

mengniwang95 commented 2 years ago

I got a fp32 onnx model first, and then generated the qdq model. How does NNCF generate QDQ model?

sfatimar commented 2 years ago

Please go through this tool currently in development. https://github.com/openvinotoolkit/nncf/tree/develop/examples/experimental/onnx.