Open flytair opened 1 month ago
Should {DQ, Pad, Q} be fused, or elided into simply {Pad}? (similarly, I could see many other operators like slice where the preceding and following DQ and Q are elidable)
Should {DQ, Pad, Q} be fused, or elided into simply {Pad}? (similarly, I could see many other operators like slice where the preceding and following DQ and Q are elidable)
as my understanding, the pad operator is independent to data type, so the Q and DQ operator is not necessary. and anyone can correct me?
Should {DQ, Pad, Q} be fused, or elided into simply {Pad}? (similarly, I could see many other operators like slice where the preceding and following DQ and Q are elidable)
as my understanding, the pad operator is independent to data type, so the Q and DQ operator is not necessary. and anyone can correct me?
I see your screenshot shows different scales and zero points for the entering DQ and exiting Q, meaning that would at least require the pad to be followed by a linear rescaling and adjustment, rather than complete elision. (Yufeng Li knows much more about this than I do)
Should {DQ, Pad, Q} be fused, or elided into simply {Pad}? (similarly, I could see many other operators like slice where the preceding and following DQ and Q are elidable)
as my understanding, the pad operator is independent to data type, so the Q and DQ operator is not necessary. and anyone can correct me?
I see your screenshot shows different scales and zero points for the entering DQ and exiting Q, meaning that would at least require the pad to be followed by a linear rescaling and adjustment, rather than complete elision. (Yufeng Li knows much more about this than I do)
To remove the quantization and dequantization (Q&DQ) operations and add a linear rescaling after padding (pad), does it need code modification, or are there any existing high-level interfaces that can be reused?
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
The DequantizeLinear, pad, and QuantizeLinear operations in the statically quantized model using the optimization level ORT_ENABLE_EXTENDED are not fused into one operation. My understanding is that the pad operator should be independent of data types, so I don't understand why DequantizeLinear and QuantizeLinear are needed for dequantization and quantization before and after the pad operation, as shown in the figure below.
To reproduce
statically quantized the model which includes pad operation using the optimization level ORT_ENABLE_EXTENDED
Urgency
No response
Platform
Windows
OS Version
windows11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
enc_int8_static_extended_opt.zip
Is this a quantized model?
Yes