tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
Apache License 2.0
1.49k stars 319 forks source link

Full Int8 QAT not working #974

Open MATTYGILO opened 2 years ago

MATTYGILO commented 2 years ago

Just a quick question. I want my final model to be full int8 instead of float32 for input and outputs. I want the training to be as accurate as possible. Do I train with quantised input and outputs? Because I have followed the common procedure in the comprehensive guide (with my custom model) and it hasn't worked. So

  1. I trained using the comprehensive guide but modified it to my model
  2. After training I use these settings to quantise my model
    converter.experimental_new_converter = True
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.representative_dataset = representative_dataset
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8
    converter.inference_output_type = tf.int8
  3. When I go to evaluate the model it is completely inaccurate

What do I need to do to allow for full int8 to work?

All help welcome

thaink commented 2 years ago

Hi Matty, Passing converter.representative_dataset = representative_dataset is only required for post-training quantization. If you want to use QAT, follow the guide at https://www.tensorflow.org/model_optimization/guide/quantization/training_example( use quantize_model before training and train it with the non-quantized input as usual, and then convert it to TFLite).

MATTYGILO commented 2 years ago

@thaink I have followed the guides. However I'm using tflite micro which requires full int 8. In none of the examples does it show what to do for full int 8 for input and output. Even if you QAT you still have to convert it using post training quantization and there are no examples of int8 inputs and outputs for QAT.

thaink commented 2 years ago

The inference_input_type and inference_output_type is to use int8 input and output actually.

MATTYGILO commented 2 years ago

@thaink I've already set those values. Are you suggesting I train on quantised data?

thaink commented 2 years ago

Can you share or describe what your output model looks like?

MATTYGILO commented 2 years ago

@thaink Its a yamnet, I followed this medium post https://medium.com/@antonyharfield/converting-the-yamnet-audio-detection-model-for-tensorflow-lite-inference-43d049bd357c

MATTYGILO commented 2 years ago

@thaink I've converted the model with full int 8 but the output of the model is complete rubbish. So I did QAT, of which I have converted to full int 8 but the output is complete rubbish.

MATTYGILO commented 2 years ago

@thaink What is the suggest way of doing full int8 QAT on a model

MATTYGILO commented 2 years ago

@thaink This is how I QAT.

import tensorflow_model_optimization as tfmot

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

class DefaultDenseQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # List all of your weights
    weights = {
        "kernel": LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)

    # List of all your activations
    activations = {
        "activation": MovingAverageQuantizer(num_bits=8, symmetric=False, narrow_range=False, per_axis=False)

    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.weights.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        output = []
        for attribute, quantizer in self.activations.items():
            if hasattr(layer, attribute):
                output.append((getattr(layer, attribute), quantizer))

        return output

    def set_quantize_weights(self, layer, quantize_weights):
        # Add this line for each item returned in `get_weights_and_quantizers`
        # , in the same order

        count = 0
        for attribute in self.weights.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_weights[count])
                count += 1

    def set_quantize_activations(self, layer, quantize_activations):
        # Add this line for each item returned in `get_activations_and_quantizers`
        # , in the same order.
        count = 0
        for attribute in self.activations.keys():
            if hasattr(layer, attribute):
                setattr(layer, attribute, quantize_activations[count])
                count += 1

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}
from quant import DefaultDenseQuantizeConfig
from tensorflow_model_optimization.python.core.quantization.keras.quantize import quantize_scope, quantize_apply
import tensorflow_model_optimization as tfmot

with quantize_scope({
    "DefaultDenseQuantizeConfig": DefaultDenseQuantizeConfig,
    "CustomLayer": CustomLayer
    def apply_quantization_to_layer(layer):
        return tfmot.quantization.keras.quantize_annotate_layer(layer, DefaultDenseQuantizeConfig())

    annotated_model = tf.keras.models.clone_model(

    qat_model = tfmot.quantization.keras.quantize_apply(annotated_model)



Please I need all help and advice

thaink commented 2 years ago

@Xhark Could you check if the Matt is QAT-ing the right way?

haozh7109 commented 1 year ago

Hi, @MATTYGILO, I am experiencing the same problem. The full int8 QAT-derived Tensorflow-lite model (using reference data to set input and output to Int8) doesn't seem to work. I am losing a lot of accuracy after the model conversion. I was wondering if you found a solution for this Full int8 QAT model conversion. Thank you!

Alexey234432 commented 1 year ago

Thank you very much for your help. I am facing the same issue with mobilenetV3 (both with PTQ and QAT), any ideas on why this might be the case? Thank you. @thaink

tarushbansal commented 7 months ago

Hi, I am facing the same issue as well for QAT with MobileNetV3 (accuracy for QAT TFLite model is much lower than the corresponding QAT Keras Model). Is there any fix for this yet?