Open Zjq9409 opened 3 years ago
Hi @jianqianzhou, Is it same as https://github.com/microsoft/onnxruntime/issues/6823? If so, please track this issue there. Thanks.
Hi @jianqianzhou, Is it same as #6823? If so, please track this issue there. Thanks.
not the same,#6832 use tensorflow albert quantized,but an error occurred during the optimization phase,and no result. this question use pytorch albert quantized,after optimization and quantization,the quantized result is different from the origin result.
Hi @jianqianzhou, Is it same as #6823? If so, please track this issue there. Thanks.
not the same,#6832 use tensorflow albert quantized,but an error occurred during the optimization phase,and no result. this question use pytorch albert quantized,after optimization and quantization,the quantized result is different from the origin result.
Does onnxruntime supports albert pytorch and tensorflow quantization currently?and Dose 'Tokenizer' will affect the accuracy?
@jianqianzhou, I think it is expected that quantized model will have different output since there are quantize and de-quantize in graph. Could you evaluate the classification accuracy instead of comparing tensor value?
Quantization works well on BERT model as the accuracy on SQuAD is on par in MLPerf test. We've not tested it on ALBert yet.
If accuracy of post-training quantization cannot meet your requirement, you will have to try quantization aware training (QAT), which shall have same accuracy as your trained model.
test
Hey @Zjq9409
I see you've had some of the same issues that I'm currently having.
Did you manage to either Optimise or Quantise Albert?
This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the bug I use huggingface transformers albert model albert-base-v2 to classify text,meanwhile,I use onnxruntime to optimized and quantized,
opt_model = optimizer.optimize_model( 'onnx/albert_chinese_base.onnx', 'bert', num_heads=12, hidden_size=768, optimization_options=opt_options) opt_model.save_model_to_file('albert.opt.onnx')
quantized_model_path = quantize(Path("albert.opt.onnx"))
the optimized result is the same with origin result, but the quantized result is different from origin result,as follows: origin result as follows: tensor([[ -4.9603, -9.1380, -12.4145, -2.8629, -2.9166, -14.1528, -0.7807, -1.6513, -8.1648, 12.1220]]) quantized result as follows: tensor([[ -5.6812, -11.3905, -21.9474, 0.0971, -8.1226, -19.2604, 1.3498, -9.9139, -16.5754, 5.7205]])System information onnx 1.8.1 onnxruntime 1.6.0 onnxruntime-tools 1.6.0 transformers 4.3.3