Open marksein07 opened 4 years ago
@marksein07 , QInt8 should get a little bit better performance. Could you share your OS and CPU information that you get worse perf with QInt8?
@yufenglee To clarify, it's the deep learning model prediction accuracy droping significantly. the QInt8 outperform QUInt8 exactly.
The 2 quantized model come from the above optimized onnx model, and
the only difference between two quantized model is
weight_type = QuantType.QInt8
and
weight_type = QuantType.QUInt8
I understand quantized model lacking fidelity. However, the prediction is on training dataset, so it shouldn't drop considerably.
I almost copy past the sample code, and I got a totally different result with sample code when verifying the quantized model accuracy
My OS is ubuntu 18.04, onnx version 1.7.0, onnx runtime version 1.5.2 and cpu info :
Thanks!
@marksein07 , thanks for your clarification. Which version are you using and could you share your model and verification data?
@yufenglee
onnx version 1.7.0, onnxruntime version 1.5.2
I use huggingface "bert-base-multilingual-cased" model with one Linear layer as a classifier
I use some private dataset on sentiment analysis, but I can give you some example.
input :"[CLS]How are you[SEP]fine, thank you[SEP]" label : 1
def quantize_dynamic(model_input: Path,
model_output: Path,
op_types_to_quantize=[],
per_channel=False,
reduce_range=False,
activation_type=QuantType.QUInt8,
weight_type=QuantType.QUInt8,
nodes_to_quantize=[],
nodes_to_exclude=[])
Does the activation_type
is different from weight_type
matter?
and it raises another error when I try to load onnx quantized model using onnxruntime
It looks kind of NotImplement Error. Does it what you expect when activation_type is QuantType.QInt8 ?
thanks!
@marksein07 , we support both QUInt8 and QInt8 for weight, but QUInt8 only for activation. You will see this error if you use QInt8 for activation.
@marksein07, could you please try the reduce_range while quantizing as mentioned in #5849 ?
@yufenglee , it's much better, thanks!, However, there are still a little performance drop, and have nothing to do with model robustness in my private experiments.
@marksein07 , I just re-ran the notebook and the accuracy is good with u8s8. I would like to double confirm that you get better accuracy with u8u8, right?
@yufenglee , I'm not sure what you mean about "u8u8" Does the "u8u8" mean that call quantize_dynamic with parameters below?
activation_type=QuantType.QUInt8,
weight_type=QuantType.QUInt8,
@marksein07 , I just re-ran the notebook and the accuracy is good with u8s8. I would like to double confirm that you get better accuracy with u8u8, right?
if u8u8 is what I guess, => Yes, u8u8 is about 20% accuracy better than u8s8
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Convert_Models_and_Tune_Performance_with_OLive_Docker_Images
for the sample code above, the 8th cell to quantize the model,
weight_type should be QuantType.QUInt8 or the performance of quantized model might drop significantly.