Quantize bert sample code seem to be wrong.

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.86k stars 2.94k forks source link

Quantize bert sample code seem to be wrong. #5644

Open marksein07 opened 4 years ago

marksein07 commented 4 years ago

Convert_Models_and_Tune_Performance_with_OLive_Docker_Images

for the sample code above, the 8th cell to quantize the model,

quantize_dynamic(onnx_model_path,
                     quantized_model_path,
                     weight_type=QuantType.QInt8)

weight_type should be QuantType.QUInt8 or the performance of quantized model might drop significantly.

yufenglee commented 4 years ago

@marksein07 , QInt8 should get a little bit better performance. Could you share your OS and CPU information that you get worse perf with QInt8?

marksein07 commented 4 years ago

@yufenglee To clarify, it's the deep learning model prediction accuracy droping significantly. the QInt8 outperform QUInt8 exactly.

12971575958376 The 2 quantized model come from the above optimized onnx model, and the only difference between two quantized model is weight_type = QuantType.QInt8 and weight_type = QuantType.QUInt8

I understand quantized model lacking fidelity. However, the prediction is on training dataset, so it shouldn't drop considerably.

I almost copy past the sample code, and I got a totally different result with sample code when verifying the quantized model accuracy

My OS is ubuntu 18.04, onnx version 1.7.0, onnx runtime version 1.5.2 and cpu info : 12971628378445

Thanks！

yufenglee commented 4 years ago

@marksein07 , thanks for your clarification. Which version are you using and could you share your model and verification data?

marksein07 commented 4 years ago

@yufenglee

onnx version 1.7.0, onnxruntime version 1.5.2

I use huggingface "bert-base-multilingual-cased" model with one Linear layer as a classifier

I use some private dataset on sentiment analysis, but I can give you some example.

input :"[CLS]How are you[SEP]fine, thank you[SEP]" label : 1

def quantize_dynamic(model_input: Path,
                     model_output: Path,
                     op_types_to_quantize=[],
                     per_channel=False,
                     reduce_range=False,
                     activation_type=QuantType.QUInt8,
                     weight_type=QuantType.QUInt8,
                     nodes_to_quantize=[],
                     nodes_to_exclude=[])

Does the activation_type is different from weight_type matter? and it raises another error when I try to load onnx quantized model using onnxruntime

12983412744436

It looks kind of NotImplement Error. Does it what you expect when activation_type is QuantType.QInt8 ?

thanks!

yufenglee commented 4 years ago

@marksein07 , we support both QUInt8 and QInt8 for weight, but QUInt8 only for activation. You will see this error if you use QInt8 for activation.

yufenglee commented 4 years ago

@marksein07, could you please try the reduce_range while quantizing as mentioned in #5849 ?

marksein07 commented 3 years ago

@yufenglee , it's much better, thanks!, However, there are still a little performance drop, and have nothing to do with model robustness in my private experiments. 13172200484247

yufenglee commented 3 years ago

@marksein07 , I just re-ran the notebook and the accuracy is good with u8s8. I would like to double confirm that you get better accuracy with u8u8, right?

marksein07 commented 3 years ago

@yufenglee , I'm not sure what you mean about "u8u8" Does the "u8u8" mean that call quantize_dynamic with parameters below?

activation_type=QuantType.QUInt8,
weight_type=QuantType.QUInt8,

@marksein07 , I just re-ran the notebook and the accuracy is good with u8s8. I would like to double confirm that you get better accuracy with u8u8, right?

if u8u8 is what I guess, => Yes, u8u8 is about 20% accuracy better than u8s8

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.