Open yeliang2258 opened 2 years ago
I found that almost all quantized models have this phenomenon. Turning off the optimization can align with the float model. Turning on the optimization leads to a drop in accuracy.
What kind of CPU do you run the model on? Could you please try quantizing the model with u8u8 format(both activation and weight uint8) https://onnxruntime.ai/docs/performance/quantization.html#when-and-why-do-i-need-to-try-u8u8 ?
my cpu type is:Intel(R) Xeon(R) Gold 6271C
my cpu type is:Intel(R) Xeon(R) Gold 6271C
It doesn't have VNNI instructions. How do you generate the model? With ORT quantization tool or tf2onnx? If you are using ORT quantization tool, could you please try quantizing the model with u8u8 format and see if the performance gets better?
my cpu type is:Intel(R) Xeon(R) Gold 6271C
It doesn't have VNNI instructions. How do you generate the model? With ORT quantization tool or tf2onnx? If you are using ORT quantization tool, could you please try quantizing the model with u8u8 format and see if the performance gets better?
Thank you for your reply. I ran it again and found that there is indeed no problem of precision drop on the VNNI machine. Also, may I ask, symmetric quantization can be converted to a u8u8 format ONNX quantize model?
my cpu type is:Intel(R) Xeon(R) Gold 6271C
It doesn't have VNNI instructions. How do you generate the model? With ORT quantization tool or tf2onnx? If you are using ORT quantization tool, could you please try quantizing the model with u8u8 format and see if the performance gets better?
Thank you for your reply. I ran it again and found that there is indeed no problem of precision drop on the VNNI machine. Also, may I ask, symmetric quantization can be converted to a u8u8 format ONNX quantize model?
Thaks for your confirmation! So, you convert quantized model from TFLite. Yes, it can be converted to u8u8. You can do it by replacing int8 zeropoint in Q/DQ with uint8 by adding 128, and similar process to weight of Conv and Gemm/MatMul.
We will add an option to run a s8s8 model with u8u8 kernels on x64 natively in ORT for this kind of case.
Describe the bug
I have a quantized model. When all optimizations are turned on, the accuracy is found to drop by 5 points, but when all optimizations are turned off, the accuracy is not dropped. What could be the cause? Looking forward to your reply.
System information
To Reproduce
when turn on all optimization, the accuracy is found to drop by 5 points:
when turn off all optimization, the accuracy is not dropped:
Additional context Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging. My quantized model: mobilenet_onnx_quant_model.onnx.zip