MeghnaNatraj commented 4 years ago

UPDATE

You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0 onwards (and will be available in the final TF 2.4 release as well).

You will not require any workaround, i.e, you don't have to use TF 1.x

To verify that your TF version supports this, run the following code and check if runs successfully:

import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)

ISSUE

System information TensorFlow version (use command below): 2.4.0-dev20200728

Describe the current behavior Error converting quantize aware trained tensorflow model to a fully integer quantized tflite model - error: RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'

Describe the expected behavior Convert successfully

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook. https://colab.research.google.com/gist/sayakpaul/8c8a1d7c94beca26d93b67d92a90d3f0/qat-bad-accuracy.ipynb

MeghnaNatraj commented 4 years ago

@sayakpaul

I can reproduce the issue and will get back to you when I resolve this. https://colab.research.google.com/gist/MeghnaNatraj/8458ad508f5355769a980400d4d9d194/qat-bad-accuracy.ipynb

Possible issue: If you remove the TFLITE_BUILTINS_INT8 (don't enforce INT8) -- it works fine. The issue is that the model has 2 consecutive quantize at the beginning and 2 consecutive dequantize at the end (not sure why) -- probably because of the way tf.keras..mobilenetv2 is structured.

Couple of things to note (especially as you are involved in creating awesome tutorials! 👍 ): (The colab gist above has all the final code with the following suggested changes. NOTE: it also has some TODOs where i have simplified the code for faster execution)

Ensure you use the latest tensorflow_model_optimization and tensorflow-datasets and uninstall tensorflow when you install tf-nightly
Code readability: A) Try to group many similar code sections into one. Sections can be: all imports and initial settings code, all data processing related code, all training related code, all conversion code, etc. B) If your model is for a basic tutorial and it's small, use full paths to keras APIs -- tf.keras.layers..... instead of from tf.keras.layers import *.
For data generation: Have 3 parts 1) train_raw (loaded from tfds) - data has 3 dimensions 2) train_preprocessed (with all preprocessing steps) - data has 3 dimensions 3) train_data (the final dataset prepared for training, this would have batching shuffle and prefetch function.) - data has 4 dimensions Note: Repeat all 3 for validation data BUT do not shuffle the data for validation (or test dataset.)
Representative dataset - should only have 4 dimensions for images. You initially used the batched training data with shape=(32, 244, 244, 3) and we further add a batch size in the representative dataset (tf.expand_dims(train_data_image, 0)) - as a result the shape increases to 5! (1, 32, 244, 244, 5) This causes some errors which is quite hard to debug (eg: PAD op dimensions exceeded >=4). You instead want (1, 244, 244, 5) hence we use the train_preprocessed data (check the 3rd point above) where the images don't yet have a batch dimension shape (244, 244, 3) for the representative_dataset function.
Representative dataset - do not use next(iter(train_ds..)). This will make the image and label as a sequential list of items and cause failures. Instead use for image, _ in train_ds_preprocessed:

sayakpaul commented 4 years ago

Thanks! First of all, the notebook that I had provided to you was meant for reproducing the issue I was facing. Before releasing it publicly, I sure would have modified it a bit.

A couple of things:

uninstall tensorflow when you install tf-nightly

Not sure about this since when I install pip install tf-nightly at the beginning of a Colab session (before doing anything) I have the nightly version gets reflected always. Is there anything specific for which you'd do this?

Sections can be: all imports and initial settings code

I respectfully disagree. I won't put together the pip installs inside the same code block where I am importing dependencies. I try to break longer code blocks some times which you might have seen in my notebook as well. This is my personal preference. If "all training code" seems a bit unreadable to me I'd break it into multiple cells and the same applies for "all conversion code".

If your model is for a basic tutorial and it's small, use full paths to keras APIs -- tf.keras.layers..... instead of from tf.keras.layers import *

Okay, will keep in mind. But for a bit more complex tutorials/notebooks (in general), I don't think I'd follow it.

For data generation

In the original notebook, I first loaded the dataset from tfds, visualized it (which I think is a good practice), mapped the resizing step, then mapped the scaling step and batching-shuffling (shuffling not for the validation set). The only thing I'd change is merging the resizing step and scaling step inside a utility and map them.

If you emphasized on the data generation point because I separated the steps into different cells, yes, I won't generally do that.

Representative dataset

Agreed on the point. You might have mistakenly mentioned 5 channels (244, 244, 5) but note that in the flowers' dataset the images come in 3 channels. I also see the problem in the representative_dataset_gen utility I used:

representative_images, _ = next(iter(train_ds))

def representative_dataset_gen():
    for image in representative_images:
        yield [tf.expand_dims(image, 0)]

If I'd have changed it to something like the following I think it should be good.

representative_images, _ = next(iter(train_ds))

def representative_dataset_gen():
    for image in representative_images:
        yield [image]

I can confirm that in this way image would have a shape of (1, 224, 224, 3).

You might also consider adding these instructions in the documentation.

Representative dataset - do not use next(iter(trainds..)). This will make the image and label as a sequential list of items and cause failures. Instead use for image, in train_ds_preprocessed:

Okay. But what if I'd want to restrict the number of instances in the representative dataset? Because for bigger datasets it's very difficult to have the entire training dataset streamed as the representative dataset. Would you suggest something like the following?

train_ds_unbatched = train_ds.unbatch() # train_ds already batched and preprocessed

def representative_dataset_gen():
    for i, (image, _) in enumerate(train_ds_unbatched):
        if i==0: # let's say I want 100 samples only
            break
        yield[image]

MeghnaNatraj commented 4 years ago

Great points! Yes, you can choose what you think works best --- eg: I learnt many new things from your tutorial (loading TFDS datasets with the [:85%]..method! who knew! :))

For the representative dataset -- Would .take() work? https://www.tensorflow.org/datasets/overview#as_numpy (ignore the as_numpy() part... just wanted to show an example usage)

sayakpaul commented 4 years ago

Yes, take() should work as well. Having a note in the documentation on handling large datasets while creating the representative dataset would help. The representative dataset generation can get non-trivial at times and here's an example (which I am sure you are already aware of).

leondgarse commented 4 years ago

Do you have any plan solving this? I just encounterd this issue... Here is my minimal reproduing code

Train


import tensorflow as tf
from tensorflow import keras
import tensorflow_model_optimization as tfmot

Load MNIST dataset

(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data() train_images, test_images = train_images / 255.0, test_images / 255.0

Define the model architecture.

model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Flatten(), keras.layers.Dense(10) ])

Train the digit classification model

model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))

1875/1875 [==============================] - 2s 946us/step - loss: 0.7303 - accuracy: 0.8100 - val_loss: 0.3097 - val_accuracy: 0.9117

Train the quantization aware model

q_aware_model = tfmot.quantization.keras.quantize_model(model) q_aware_model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) q_aware_model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))

1875/1875 [==============================] - 2s 1ms/step - loss: 0.3107 - accuracy: 0.9136 - val_loss: 0.2824 - val_accuracy: 0.9225

- Convert
```py
# Define the representative data.
def representative_data_gen():
    for input_value in tf.data.Dataset.from_tensor_slices(train_images.astype("float32")).batch(1).take(100):
        yield [input_value]

# Successful converting from model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()

# Successful converting from model to uint8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()

# Successful converting from q_aware_model
q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
q_converter.optimizations = [tf.lite.Optimize.DEFAULT]
q_converter.representative_dataset = representative_data_gen
q_tflite_model = q_converter.convert()

# Fail converting from q_aware_model to uint8
q_converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
q_converter.inference_input_type = tf.uint8
q_converter.inference_output_type = tf.uint8
q_tflite_model_quant = q_converter.convert()

Throws error

RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'.

Another test without tf.lite.OpsSet.TFLITE_BUILTINS_INT8


# Successful converting from model to uint8
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()

Fail converting from q_aware_model to uint8

q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) q_converter.optimizations = [tf.lite.Optimize.DEFAULT] q_converter.representative_dataset = representative_data_gen q_converter.inference_input_type = tf.uint8 q_converter.inference_output_type = tf.uint8 q_tflite_model_quant = q_converter.convert()

Throws error
```py
RuntimeError: Unsupported output type UINT8 for output tensor 'Identity' of type FLOAT32.

MeghnaNatraj commented 4 years ago

You can resolve the second error by using removing all the lines q_converter.inference_output_type = tf.uint8. We're currently working on fixing this -- will post an update when it's done.

leondgarse commented 4 years ago

Thanks for your update. Ya, removing q_converter.inference_output_type = tf.uint8 will make it successful, but will leave output as float32.

interpreter = tf.lite.Interpreter(model_content=q_tflite_model_quant)
print('input: ', interpreter.get_input_details()[0]['dtype'])
# input:  <class 'numpy.uint8'>
print('output: ', interpreter.get_output_details()[0]['dtype'])
# output:  <class 'numpy.float32'>

dtlam26 commented 4 years ago

what about the problem with RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'? I have also encounter with it when trying to convert my model. Is it because QAT not fully support full integer edge device like coral?

dtlam26 commented 4 years ago

@MeghnaNatraj I have reconstructed a Resnet model from QAT guide https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide

This is my model and I have QAT successfully with it, but when I try to convert fully to uint8 or int8 for edge_tpu. I still got the problem: RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'

My model code:

from tensorflow import Tensor
from tensorflow.keras.layers import Input, Conv2D, ReLU, BatchNormalization,\
                                    Add, AveragePooling2D, Flatten, Dense, concatenate
from tensorflow.keras.models import Model, Sequential
import tensorflow as tf
import tensorflow_model_optimization as tfmot

LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer

quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope

class DefaultBNQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    # Configure how to quantize weights.
    def get_weights_and_quantizers(self, layer):
        # return []
        return [(layer.weights[i], LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)) for i in range(2)]

    # Configure how to quantize activations.
    def get_activations_and_quantizers(self, layer):
        return []

    def set_quantize_weights(self, layer, quantize_weights):
#       # Add this line for each item returned in `get_weights_and_quantizers`
#       # , in the same order
        # layer.kernel = quantize_weights[0]
        # print(quantize_weights)
        layer.gamma = quantize_weights[0]
        layer.beta = quantize_weights[1]
        # layer.moving_mean = quantize_weights[2]
        # layer.moving_variance = quantize_weights[3]
        # pass

    def set_quantize_activations(self, layer, quantize_activations):
      # Add this line for each item returned in `get_activations_and_quantizers`
      # , in the same order.
        pass

    # Configure how to quantize outputs (may be equivalent to activations).
    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}

def relu_bn(inputs: Tensor) -> Tensor:
    relu = ReLU()(inputs)
    bn = quantize_annotate_layer(BatchNormalization(), DefaultBNQuantizeConfig())(relu)
    return bn

def residual_block(x: Tensor, downsample: bool, filters: int, kernel_size: int = 3) -> Tensor:
    y = Conv2D(kernel_size=kernel_size,
               strides= (1 if not downsample else 2),
               filters=filters,
               padding="same")(x)
    y = relu_bn(y)
    y = Conv2D(kernel_size=kernel_size,
               strides=1,
               filters=filters,
               padding="same")(y)

    if downsample:
        x = Conv2D(kernel_size=1,
                   strides=2,
                   filters=filters,
                   padding="same")(x)
    out = Add()([x, y])
    out = relu_bn(out)
    return out

def create_res_net_quantize(inputs,embedding_size,quantize=True):
    quantize = True
    num_filters = 64
    t = quantize_annotate_layer(BatchNormalization(), DefaultBNQuantizeConfig())(inputs)

    t = Conv2D(kernel_size=3,
               strides=1,
               filters=num_filters,
               padding="same")(t)
    t = relu_bn(t)

    num_blocks_list = [2, 5, 5, 2]
    for i in range(len(num_blocks_list)):
        num_blocks = num_blocks_list[i]
        for j in range(num_blocks):
            t = residual_block(t, downsample=(j==0 and i!=0), filters=num_filters)
        num_filters *= 2

    t = AveragePooling2D(4)(t)
    t = Flatten()(t)
    outputs = Dense(embedding_size)(t)

    model = quantize_annotate_model(Model(inputs, outputs))
    with quantize_scope(
      {'DefaultBNQuantizeConfig': DefaultBNQuantizeConfig}):
  # Use `quantize_apply` to actually make the model quantization aware.
        quant_aware_model = tfmot.quantization.keras.quantize_apply(model)
    quant_aware_model.summary()
    return quant_aware_model

I used the same method as @leondgarse did

hangrymoon01 commented 4 years ago

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

dtlam26 commented 4 years ago

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done

hangrymoon01 commented 4 years ago

Thanks @dtlam26 for the response. Strangely, it turns out that for MobileNetV1 the conversion is working fine. For MobileNetV2, as @MeghnaNatraj pointed out in earlier comment, there are two consecutive QUANTIZE and DEQUANTIZE nodes for inputs and outputs respectively which might be causing the issue. Attaching the screenshots. INPUT: OUTPUT:

dtlam26 commented 4 years ago

Thanks @dtlam26 for the response. Strangely, it turns out that for MobileNetV1 the conversion is working fine. For MobileNetV2, as @MeghnaNatraj pointed out in earlier comment, there are two consecutive QUANTIZE and DEQUANTIZE nodes for inputs and outputs respectively which might be causing the issue. Attaching the screenshots. INPUT: OUTPUT:

Yes, you can still quantize the model for other type of quantization except int8 (I mean Builtin_int8) Furthermore, You can check out this link for the detail bypass QAT on tf2: https://github.com/tensorflow/model-optimization/issues/377#issuecomment-625093345 From this, by declaring double input to the model, that is why the QAT for mobilenetv2 got double quantize and dequantize

matteorisso commented 4 years ago

You can resolve the second error by using removing all the lines q_converter.inference_output_type = tf.uint8. We're currently working on fixing this -- will post an update when it's done.

Any news ??

hangrymoon01 commented 4 years ago

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done

@dtlam26 can you point to some resources for QAT for tf1.x and then quanitzation. I am trying the following code (without QAT) but getting some error on TF 1.15:

from tensorflow.keras import layers from tensorflow.keras.models import Sequential num_classes = 20

model = Sequential([ layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)), layers.MaxPooling2D(), layers.Conv2D(32, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Conv2D(64, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

model.fit(train_generator, epochs=1, steps_per_epoch=100)

model.save('/tmp/temp.h5')

converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5") converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} tflite_model = converter.convert()

I am getting below error: ConverterError: See console for info. 2020-09-23 22:55:27.815650: F tensorflow/lite/toco/tooling_util.cc:1734] Array conv2d_3/Relu, which is an input to the MaxPool operator producing the output array max_pooling2d_3/MaxPool, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation. Fatal Python error: Aborted

MeghnaNatraj commented 4 years ago

@dtlam26 As you are not using QAT and instead using post-training quantization, you need to provide a representative_dataset in order to quantize the model.

Modify the steps in your code as:

.
.

model.save('/tmp/temp.h5')

converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()

Refer to post-training quantization for more information.

dtlam26 commented 4 years ago

@dtlam26 As you are not using QAT and instead using post-training quantization, you need to provide a representative_dataset in order to quantize the model.

Modify the steps in your code as:
.
.

model.save('/tmp/temp.h5')

converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
  for _ in range(num_calibration_steps):
    # Get sample input data as a numpy array in a method of your choosing.
    yield [input]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
Refer to post-training quantization for more information.

Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok

matteorisso commented 4 years ago

@dtlam26

Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok

Please can you tell me how you are able to perform QAT in tf1 ??

dtlam26 commented 4 years ago

Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.

As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done

@dtlam26 can you point to some resources for QAT for tf1.x and then quanitzation. I am trying the following code (without QAT) but getting some error on TF 1.15:

from tensorflow.keras import layers from tensorflow.keras.models import Sequential num_classes = 20

model = Sequential([ layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)), layers.MaxPooling2D(), layers.Conv2D(32, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Conv2D(64, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])

model.fit(train_generator, epochs=1, steps_per_epoch=100)

model.save('/tmp/temp.h5')

converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5") converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} tflite_model = converter.convert()

I am getting below error: ConverterError: See console for info. 2020-09-23 22:55:27.815650: F tensorflow/lite/toco/tooling_util.cc:1734] Array conv2d_3/Relu, which is an input to the MaxPool operator producing the output array max_pooling2d_3/MaxPool, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation. Fatal Python error: Aborted

You can check out this medium and try to create a training graph and eval graph to QAT 1.x https://medium.com/analytics-vidhya/mobile-inference-b943dc99e29b This GitHub is also a good example https://github.com/lusinlu/tensorflow_lite_guide

dtlam26 commented 4 years ago

@dtlam26

Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok

Please can you tell me how you are able to perform QAT in tf1 ??

I have attached the source for example. However, create eval graph will forget the last layer of your model from the graph. You have to add to the graph a dummy part. Example, tf.maximum(output,1e-27) for regression problems

hangrymoon01 commented 4 years ago

@dtlam26 Thanks for the resources. @Mattrix00 I also found this notebook that is working for me https://colab.research.google.com/drive/15itdlIyLmXISK6SDAzAFGUgjatfVr0Yq

matteorisso commented 4 years ago

@hangrymoon01, @dtlam26 thank you so much for your help !

MeghnaNatraj commented 4 years ago

Making as resolved due to inactivity. Feel free to reopen it if the issue persists.

google-ml-butler[bot] commented 4 years ago

Are you satisfied with the resolution of your issue? Yes No

MeghnaNatraj commented 3 years ago

UPDATE

You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0 onwards (and will be available in the final TF 2.4 release as well).

You will not require any workaround, i.e, you don't have to use TF 1.x

To verify that your TF version supports this, run the following code and check if runs successfully:

import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)

msokoloff1 commented 3 years ago

UPDATE

You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0 onwards (and will be available in the final TF 2.4 release as well).

You will not require any workaround, i.e, you don't have to use TF 1.x

To verify that your TF version supports this, run the following code and check if runs successfully:
import tensorflow as tf
assert tf.__version__[:2] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)

For those of you mindlessly tpying this into your terminal. It should be the following to check compatibility

import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)

MeghnaNatraj commented 3 years ago

@msokoloff1 Thank you for that fix! I've updated all the comments above.

msokoloff1 commented 3 years ago

@MeghnaNatraj using tf 2.4.0rc2 I am still facing the issue "Quantization not yet supported for op: DEQUANTIZE". I am simply using the tf.keras.applications vgg16 network with full model quantization from the tfmot library. Do you have an example of this working with tf 2.4?

Just to make sure I wasn't doing something wrong locally, I get the exact same error if I use the notebook that you linked to above in the original post. The only change I made to the notebook was upgrading to 2.4.0rc0 and I still get the same error.

MeghnaNatraj commented 3 years ago

@msokoloff1 Could you post a link an end-to-end colab notebook so I can reproduce this issue?

msokoloff1 commented 3 years ago

@MeghnaNatraj https://colab.research.google.com/gist/msokoloff1/f7d71b73d11adcbcb9f2420503540043/qat-bad-accuracy.ipynb#scrollTo=hMaYuw5AD8qt

It is exactly the same notebook that you posted at the start of this issue with the tensorflow version pinned to tf 2.4.0-rc0

anilsathyan7 commented 3 years ago

@msokoloff1 @MeghnaNatraj I faced the similar isuue in tf 2.4.0-rc0. I even tried latest source for tf and tfmot ; but the issue persists. QAT with tk.keras produces quantize and dequantize layers and we are unable to convert them to full integer quantization models, even after using post training quantization on top of it?

Is there any other workarounds?

MeghnaNatraj commented 3 years ago

@anilsathyan7 could you also post a colab just as @msokoloff1 did before? It will help us debug this better.

There are no workarounds yet, we will get back to you as soon as we find one.

anilsathyan7 commented 3 years ago

Here is a modified version of the post training quantization tf example with QAT: https://colab.research.google.com/drive/1aqK5Sd1hy1o55Y1t1MUYQLMgiBwmo2VD?usp=sharing

carllhsiung commented 3 years ago

I used latest tf-nightly and the error was fixed.

RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'.

TensorFlow version: tf-nightly (2.5.0-dev20201130) My gist

anilsathyan7 commented 3 years ago

Hey @car1hsiung , i think your resulting model is 'quantization aware' but not quantized (e.g. the weights are float32 instead of int8). Your input and outputs are still in float32 format. Can you check with the other two links ??

carllhsiung commented 3 years ago

Your input and outputs are still in float32 form

Hi @anilsathyan7 , I test the normal and quantized models. The quantized model is invalid as you mentioned.

Please check the updated gist

RuntimeError: tensorflow/lite/kernels/quantize.cc:113 affine_quantization->scale->size == 1 was not true.Node number 0 (QUANTIZE) failed to prepare.

BTW, the converter.inference_input_type and converter.inference_output_type can be tf.float32 for quantized model.

For example, using float input/output, you will get:

>>> print(interpreter.get_input_details())
[{
    'name': 'conv2d_input',
    'index': 0,
    'shape': array([1, 64, 64, 3], dtype = int32),
    'shape_signature': array([-1, 64, 64, 3], dtype = int32),
    'dtype': < class 'numpy.float32' > ,
    'quantization': (0.0, 0),
    'quantization_parameters': {
        'scales': array([], dtype = float32),
        'zero_points': array([], dtype = int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

>>> print(interpreter.get_output_details())
[{
    'name': 'Identity',
    'index': 13,
    'shape': array([1, 12, 12, 64], dtype = int32),
    'shape_signature': array([-1, 12, 12, 64], dtype = int32),
    'dtype': < class 'numpy.float32' > ,
    'quantization': (0.0, 0),
    'quantization_parameters': {
        'scales': array([], dtype = float32),
        'zero_points': array([], dtype = int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

Using uint8/int8 you will get:

>>> print(interpreter.get_input_details())
[{
    'name': 'conv2d_input',
    'index': 0,
    'shape': array([1, 64, 64, 3], dtype = int32),
    'shape_signature': array([-1, 64, 64, 3], dtype = int32),
    'dtype': < class 'numpy.uint8' > ,
    'quantization': (3.921568847431445e-09, 127),
    'quantization_parameters': {
        'scales': array([3.921569e-09], dtype = float32),
        'zero_points': array([127], dtype = int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

>>> print(interpreter.get_output_details())
[{
    'name': 'Identity',
    'index': 13,
    'shape': array([1, 12, 12, 64], dtype = int32),
    'shape_signature': array([-1, 12, 12, 64], dtype = int32),
    'dtype': < class 'numpy.uint8' > ,
    'quantization': (0.0470588244497776, 128),
    'quantization_parameters': {
        'scales': array([0.04705882], dtype = float32),
        'zero_points': array([128], dtype = int32),
        'quantized_dimension': 0
    },
    'sparsity_parameters': {}
}]

In other words, you need to quantize input and dequantize output with scales and zero points by yourself if using uint8/int8 input and output.

I can successfully convert quantize uint8 tflite with 2.4.0-rc1 converter._experimental_new_quantizer = True

Please check the tf-2.4-rc1 gist

anilsathyan7 commented 3 years ago

Converting and saving a quantized tflite with QAT using float inputs and outputs was not an issue even in tf 2.3 anyway ... The issue was with full integer quantization in QAT, so that we can use them with hardware acclerators(int).

Also if you have real train data you need to run fit/finetune on the qaware model before conversion to get proper quantized model.

Setting converter._experimental_new_quantizer = True, seems to be key here... Thanks, it worked with tf-nightly version also !!!

MeghnaNatraj commented 3 years ago

For QAT models, you don't need a representative dataset. Also, full integer quantization support for QAT models (full integer with (default float32)/uint8/int8 input/output) is available from TF 2.4 as shown below:

Gist

!pip uninstall -q -y tensorflow tensorflow-gpu
!pip install tensorflow==2.4
!pip install -q tensorflow-model-optimization

import tensorflow as tf
print(tf.__version__)

import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot

def get_model(is_qat=False):
  (train_x, train_y) , (_, _) = tf.keras.datasets.mnist.load_data()
  train_x = train_x.astype('float32') / 255
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10)
  ])
  if is_qat:
    model = tfmot.quantization.keras.quantize_model(model)
  model.compile(
      optimizer=tf.keras.optimizers.Adam(0.001),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
  )
  model.fit(train_x, train_y, batch_size=64, epochs=2, verbose=1)
  return model

## 1. Normal TF Model
model = get_model()

# 1a. Convert normal TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
    for i in range(10):
        yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
normal_tf_model_quantized_tflite_model = converter.convert()

# 1b. Convert normal TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
    for i in range(10):
        yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
normal_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()

## 2. QAT (Quantize Aware Trained) TF model
qat_model = get_model(is_qat=True)

# 2a. Convert QAT TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tf_model_quantized_tflite_model = converter.convert()

# 2b. Convert QAT TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
qat_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

anilsathyan7 commented 3 years ago

@MeghnaNatraj In a real training example, should'nt we run a fit/train with a train_images_subset between steps 2 and 2.a/2.b in order to maintain accuracy as mentioned in tf doc?

sayakpaul commented 3 years ago

@anilsathyan7 yes, you would want to train the model actually so that it can adjust to compensate for the information loss (induced for precision loss).

MeghnaNatraj commented 3 years ago

@anilsathyan7 @sayakpaul yes! Thanks for pointing that out. I've updated the example to also include model training :)

dtlam26 commented 3 years ago

I have a little question with the 2.4 quantization, why do the upsampling2d is not supported in tf2 while in tf1 it is acceptable? RuntimeError: Layer up_sampling2d_10:<class 'tensorflow.python.keras.layers.convolutional.UpSampling2D'> is not supported.

anilsathyan7 commented 3 years ago

@dtlam26 I found that 8 bit qunatization of Upsampling2d is supported with latest tf-optimization source.

dtlam26 commented 3 years ago

@dtlam26 I found that 8 bit qunatization of Upsampling2d is supported with latest tf-optimization source.

According to your source, It seems like they skip the quantization at the upsampling2d layer and only quantize the output as I suppose. This can be created if I custom quantize as well. It is just a surprise when they use to provide quantization on this

anilsathyan7 commented 3 years ago

@dtlam26 I'am not sure about the internal implementation; but i was able to get past that error and convert the model to tflite with QAT(Upsample2D Resize Bilinear) and tf-nighlty. The results(accuracy) seems to be fine when i test them on sample images. Anyway they also mention this:-

There are gaps between ResizeBilinear with FakeQuant and

TFLite quantized ResizeBilinear op. It has a bit more quantization
  # error than other ops in this test now.

upsample_quant

dtlam26 commented 3 years ago

@dtlam26 I'am not sure about the internal implementation; but i was able to get past that error and convert the model to tflite with QAT(Upsample2D Resize Bilinear) and tf-nighlty. The results(accuracy) seems to be fine when i test them on sample images. Anyway they also mention this:-

There are gaps between ResizeBilinear with FakeQuant and

TFLite quantized ResizeBilinear op. It has a bit more quantization

error than other ops in this test now.

Yes, I can bypass that if I self configure the quantization with no quantization in weights and activations as well. No need for the nightly

anilsathyan7 commented 3 years ago

@dtlam26 Can you share demo code/model?

dtlam26 commented 3 years ago

@dtlam26 Can you share demo code/model?

It is nothing much as you just need to follow the spec in the file you give me in those lines as only quantize the output, not the weights and activation. I follow tf guide for custom quantization. You can look into here

class UpSamplingQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):

    def get_weights_and_quantizers(self, layer):
        return []

    def get_activations_and_quantizers(self, layer):
        return []

    def set_quantize_weights(self, layer, quantize_weights):
        return

    def set_quantize_activations(self, layer, quantize_activations):
        return

    def get_output_quantizers(self, layer):
        return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
        num_bits=8, per_axis=False, symmetric=False, narrow_range=False)]

    def get_config(self):
        return {}

anilsathyan7 commented 3 years ago

@dtlam26 Can you share your minimum working example for upsample layers? I just wanted to know how the 'weights and activations' are supposed to get quantized for Upsample/ResizeBilinear Layers.

tensorflow / tensorflow

QAT conversion RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE' issue with tf-nightly #42082

Load MNIST dataset

Define the model architecture.

Train the digit classification model

1875/1875 [==============================] - 2s 946us/step - loss: 0.7303 - accuracy: 0.8100 - val_loss: 0.3097 - val_accuracy: 0.9117

Train the quantization aware model

1875/1875 [==============================] - 2s 1ms/step - loss: 0.3107 - accuracy: 0.9136 - val_loss: 0.2824 - val_accuracy: 0.9225

Fail converting from q_aware_model to uint8

TFLite quantized ResizeBilinear op. It has a bit more quantization

TFLite quantized ResizeBilinear op. It has a bit more quantization

error than other ops in this test now.