microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.32k stars 2.87k forks source link

albert quantized #6847

Open Zjq9409 opened 3 years ago

Zjq9409 commented 3 years ago

Describe the bug I use huggingface transformers albert model albert-base-v2 to classify text,meanwhile,I use onnxruntime to optimized and quantized,

opt_model = optimizer.optimize_model( 'onnx/albert_chinese_base.onnx', 'bert', num_heads=12, hidden_size=768, optimization_options=opt_options) opt_model.save_model_to_file('albert.opt.onnx')

quantized_model_path = quantize(Path("albert.opt.onnx")) the optimized result is the same with origin result, but the quantized result is different from origin result,as follows: origin result as follows: tensor([[ -4.9603, -9.1380, -12.4145, -2.8629, -2.9166, -14.1528, -0.7807, -1.6513, -8.1648, 12.1220]]) quantized result as follows: tensor([[ -5.6812, -11.3905, -21.9474, 0.0971, -8.1226, -19.2604, 1.3498, -9.9139, -16.5754, 5.7205]])

System information onnx 1.8.1 onnxruntime 1.6.0 onnxruntime-tools 1.6.0 transformers 4.3.3

jcwchen commented 3 years ago

Hi @jianqianzhou, Is it same as https://github.com/microsoft/onnxruntime/issues/6823? If so, please track this issue there. Thanks.

Zjq9409 commented 3 years ago

Hi @jianqianzhou, Is it same as #6823? If so, please track this issue there. Thanks.

not the same,#6832 use tensorflow albert quantized,but an error occurred during the optimization phase,and no result. this question use pytorch albert quantized,after optimization and quantization,the quantized result is different from the origin result.

Zjq9409 commented 3 years ago

Hi @jianqianzhou, Is it same as #6823? If so, please track this issue there. Thanks.

not the same,#6832 use tensorflow albert quantized,but an error occurred during the optimization phase,and no result. this question use pytorch albert quantized,after optimization and quantization,the quantized result is different from the origin result.

Does onnxruntime supports albert pytorch and tensorflow quantization currently?and Dose 'Tokenizer' will affect the accuracy?

tianleiwu commented 3 years ago

@jianqianzhou, I think it is expected that quantized model will have different output since there are quantize and de-quantize in graph. Could you evaluate the classification accuracy instead of comparing tensor value?

Quantization works well on BERT model as the accuracy on SQuAD is on par in MLPerf test. We've not tested it on ALBert yet.

If accuracy of post-training quantization cannot meet your requirement, you will have to try quantization aware training (QAT), which shall have same accuracy as your trained model.

Zjq9409 commented 3 years ago

test

danielbellhv commented 2 years ago

Hey @Zjq9409

I see you've had some of the same issues that I'm currently having.

Did you manage to either Optimise or Quantise Albert?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.