Open snadampal opened 7 months ago
The script is quantizing a model once and producing a quantized model. Then it starts to quantize again with the quantized model. Because this model contains operator from domain com.microsoft, shape_inference can infer the shape for any node past the first com.microsoft operator. Before PR #18043, it was not an issue as the type to quantize was always float. Now it can be float16 as well, this information is needed. I can think of two fixes, use symbolic shape inference implemented in onnxruntime assuming it supports the nodes from domain com.microsoft or use a default type infered from whatever the regular shape inference is given as information (I would probably take the most frequent float type among the available types).
Hi @xadupre , thanks for looking into it. would like to know if any chance of targeting the fix for onnxruntime 1.17.1 milestone.
PR #19455 would let you define a default type. I modified the code to pick TensorProto.FLOAT since all the code in that subfolder was implemented assuming models were using this type.
Describe the issue
Onnxruntime transformers benchmarking is failing for int8 quantized inference. the same is working fine with onnxruntime 1.16.3. I added the error details below. I found the below commit ( commit c8399a81fed9c114c43daf2103fee48d6b02bdd7: adding support for float16 weights quantization) is causing the break, but the commit already says the feature will not work with onnx 1.15. So, I tried with onnx-weekly (onnx-weekly 1.16.0.dev20240130) but the issue still exist. My question is, which onnx version is required to make this test work again?
Error:
Commit that introduced this:
To reproduce
Urgency
it is actually blocking new features development and testing, because the tests are no longer working.
Platform
Linux
OS Version
Ubuntu 22.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response