tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 323 forks source link

After QAT and TFLite converter, the type of input and output of averagepooling2d node is not same #622

Open guls999 opened 3 years ago

guls999 commented 3 years ago

I use QAT to finetune inceptionv3 model. I save models when model fit the datas. Then I load model.h5 to restore model. When I convert h5 to tflite, there is a node that the type of input is int8 and output is float32. So, I meet the error when I use tflite model to infer.

2021-01-25 09:00:36.746994: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "infer_tflite.py", line 13, in interpreter.allocate_tensors() File "/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter.py", line 335, in allocate_tensors return self._interpreter.AllocateTensors() RuntimeError: tensorflow/lite/kernels/pooling.cc:79 input->type != output->type (INT8 != FLOAT32)Node number 8 (AVERAGE_POOL_2D) failed to prepare.

System information

TensorFlow version: 2.5.0.dev20210124

TensorFlow Model Optimization version: 0.5.0

Python version: 3.6.9

I follow office guide to train and convert model. model = load_model(classic_model, compile=False) quantize_model = tfmot.quantization.keras.quantize_model q_aware_model = quantize_model(model)

Use ModelCheckpoint to save my quantization model and use load_weights to restore model. model = load_model('inception_v3.h5', compile=False) quantize_model = tfmot.quantization.keras.quantize_model q_aware_model = quantize_model(model) q_aware_model.load_weights('inception_quant.h5', by_name=True)

Use tf.lite to convert model. converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()

In face, the type of input and output of the node should be same. But after I used tf.lite.TFLiteConverter.from_keras_model(q_aware_model), there is a error what I described above.

Another questions is if I don't load_weights for new model, I can success convert h5 model to tflite model and use tflite model to infer. If I load_weights for new_model, The above error is happened.

image

Fig1 model before load weights

image

Fig2 model after load weights

I can't find difference unless params.

image Fig3 tflite model converted by h5 model before load weights

image Fig4 tflite model converted by h5 model afterload weights

qiyangzhang0329 commented 3 years ago

Having the same problem, under watching

sunzhe09 commented 3 years ago

me too

teijeong commented 3 years ago

Can you share steps to generate those two files - inception_v3.h5 and inception_quant.h5 ?

teijeong commented 3 years ago

I tried with this colab, but can't reproduce the issue.

Meanwhile, it looks strange that some layers are not quantized - I'll take a look.

fredrec commented 3 years ago

I also could not reproduce the different behaviour when converting directly vs converting after loading quantized weights. However there seem to be improperly quantized nodes in the converted TFLite model (regardless of loading quantized weights).

Conv2d_not_quantized

Codelab to reproduce.

fredrec commented 3 years ago

The issue has been fixed in tensorflow>=2.6.0rc0. Please upgrade.

A test has been added to prevent regression.

qiyangzhang0329 commented 3 years ago

@fredrec How long can TensorFlow >= 2.6.0RC0 be estimated to support? Do all models support QAT?