microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.53k stars 2.91k forks source link

[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

Open mcollinswisc opened 3 months ago

mcollinswisc commented 3 months ago

Discussed in https://github.com/microsoft/onnxruntime/discussions/21167

Originally posted by **mcollinswisc** June 25, 2024 It looks like ONNXRuntime will optimize DequantizeLinear ∘ Reshape ∘ QuantizeLinear to only the Reshape, eliminating the quantization/de-quantization, if the scales & zero points are the same. However, an equivalent Flatten is not optimized. Is this likely to be just a missing optimization, or is there some reason the qdq would be preserved in this case? Tested out in: https://gist.github.com/mcollinswisc/d1cd9d13b4e5fbad01c75dca5c9ca576 with ONNXRuntime 1.18.0
skottmckay commented 3 months ago

Should be possible to add to this list given the ONNX spec for Flatten allows 8-bit integers: https://github.com/microsoft/onnxruntime/blob/0f1f3b7705ddc2fe4f371f78a8a8b6a0428a68de/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc#L63-L68