tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 320 forks source link

about the Quantize layer when trans model #997

Open lzcchl opened 2 years ago

lzcchl commented 2 years ago

I have two model, which is mobilenetv1 for classification. the first model, it's download from google: https://storage.googleapis.com/download.tensorflow.org/models/tflite/mobilenet_v1_224_android_quant_2017_11_08.zip

the second model, it's ctreat by myself, I make its layers same to first model, it is train by keras and Post-training quantization(PTQ) to get tflite model which input/output are 'uint8', but my model have two layer 'Quantize', which is in my model's head and tail. just like image below.

the first model run on my npu is about 8ms, but the second model is 30ms, what happen? it just diff only two 'Quantize' layer. so, what can I do, I follow the sample 'https://tensorflow.google.cn/lite/performance/post_training_integer_quant' to train my model, but it is slow than official model, and a little diff from official model, please help, some suggestions, or some other guide or sample code to get the 'uint8' model.

1658281315889

1658281401224

dansuh17 commented 2 years ago

@sngyhan Could you take a look at this?

sngyhan commented 2 years ago

Hi @lzcchl, our new quantizer that uses MLIR does not officially support uint8. You need to use TOCO converter. Could you check with this change:

converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.experimental_new_converter = False
converter.experimental_new_quantizer = False