Open rednoah91 opened 3 years ago
@rednoah91 Why are you using https://github.com/fatihcakirs/mobile_models vs https://github.com/mlcommons/mobile_models?
@mcharleb The mobileBERT model in https://github.com/mlcommons/mobile_models points to https://github.com/fatihcakirs/mobile_models. They are the same.
The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite
Some Fully-connected weights has none-zero zero point (ex. weight
bert/encoder/layer_0/attention/self/MatMul19
has zero-point = 6) , which violate the TFLite quantization spec.I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.
This model was provided by Google as QAT model, approved for use by the mobile working group
@jwookiehong @rnaidu02 , can you bring this up in the mobile group discussion? I think the group needs to bless this (or fix this)
@freedomtan , can you help with the question on the TFlite quant spec?
As @Mostelk mentioned, this one is a Quantization-Aware Training (QAT) quantized model provided by Google colleagues. The quantization spec mentioned by @rednoah91 is mainly for Post-Training Quantization (PTQ).
The model downloaded from https://github.com/fatihcakirs/mobile_models/blob/main/v0_7/tflite/mobilebert_int8_384_20200602.tflite
Some Fully-connected weights has none-zero zero point (ex. weight
bert/encoder/layer_0/attention/self/MatMul19
has zero-point = 6) , which violate the TFLite quantization spec.I am afraid this might cause issues on some implementation which bypass the FC weight zero-point calculation.