MobileBERT tflite int8 model seems not follow quantization spec

mlcommons / mobile_models

MLPerf™ Mobile models

https://mlcommons.org/en/groups/inference-mobile/

Apache License 2.0

24 stars 10 forks source link

MobileBERT tflite int8 model seems not follow quantization spec #21

Open rednoah91 opened 3 years ago

rednoah91 commented 3 years ago

The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite

Some Fully-connected weights has none-zero zero point (ex. weight bert/encoder/layer_0/attention/self/MatMul19 has zero-point = 6) , which violate the TFLite quantization spec.

I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.

mcharleb commented 3 years ago

@rednoah91 Why are you using https://github.com/fatihcakirs/mobile_models vs https://github.com/mlcommons/mobile_models?

rednoah91 commented 3 years ago

@mcharleb The mobileBERT model in https://github.com/mlcommons/mobile_models points to https://github.com/fatihcakirs/mobile_models. They are the same.

Mostelk commented 3 years ago

The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite

Some Fully-connected weights has none-zero zero point (ex. weight bert/encoder/layer_0/attention/self/MatMul19 has zero-point = 6) , which violate the TFLite quantization spec.

I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.

This model was provided by Google as QAT model, approved for use by the mobile working group

willc2010 commented 3 years ago

@jwookiehong @rnaidu02 , can you bring this up in the mobile group discussion? I think the group needs to bless this (or fix this)

willc2010 commented 3 years ago

@freedomtan , can you help with the question on the TFlite quant spec?

freedomtan commented 3 years ago

As @Mostelk mentioned, this one is a Quantization-Aware Training (QAT) quantized model provided by Google colleagues. The quantization spec mentioned by @rednoah91 is mainly for Post-Training Quantization (PTQ).