Tensorflow1.15 and TensorFlow2.4 were used for int8 precision quantization of the same Pb model, and the quantization results were different. Where, the BIAS of float32 bit is the same, but the weight of int8 bit convolution kernel is different

tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

https://www.tensorflow.org/model_optimization

Apache License 2.0

1.49k stars 323 forks source link

Tensorflow1.15 and TensorFlow2.4 were used for int8 precision quantization of the same Pb model, and the quantization results were different. Where, the BIAS of float32 bit is the same, but the weight of int8 bit convolution kernel is different #794

Open carryzhang123 opened 3 years ago

carryzhang123 commented 3 years ago

1. System information

tensorflow1.15 、tensorflow2.4

2. Code


    converter = tf.lite.TFLiteConverter.from_saved_model(path)
    converter.post_training_quantize = True
    converter.target_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
    tflite_model = converter.convert()

daverim commented 3 years ago

Hi could you provide an example?

TF1.15 may have had some issues with bias overflow that have been fixed by adjusting the weight scales.

carryzhang123 commented 3 years ago

Two tflites was generating from one pb file.One is the tflite of tf1.15,other is the tflite of 2.4.Two things are different. 1、The first layer is conv2d in pb model,but it is depthwiseconv2d of the 1.15 tflite,it is conv2d of the 2.4 tflite. 2、The quantized weight of 1.15 tflite different from the quantized weight of 2.4 tflite. But the lstm weight is same.

Pictures from Netron.Thank you very much.

rino20 commented 2 years ago

HI @teijeong could you take a look?

teijeong commented 2 years ago

Not sure about different first layer in 1.15 vs 2.4. For weight difference - is the left result from TF 2.4? Recent TFLite supports per-channel quantization for conv-like ops, so that the filter would have different scale for each output dimension.