Closed MeghnaNatraj closed 3 years ago
@sayakpaul
I can reproduce the issue and will get back to you when I resolve this. https://colab.research.google.com/gist/MeghnaNatraj/8458ad508f5355769a980400d4d9d194/qat-bad-accuracy.ipynb
Possible issue:
If you remove the TFLITE_BUILTINS_INT8
(don't enforce INT8) -- it works fine. The issue is that the model has 2 consecutive quantize at the beginning and 2 consecutive dequantize at the end (not sure why) -- probably because of the way tf.keras..mobilenetv2
is structured.
Couple of things to note (especially as you are involved in creating awesome tutorials! 👍 ): (The colab gist above has all the final code with the following suggested changes. NOTE: it also has some TODOs where i have simplified the code for faster execution)
tensorflow_model_optimization
and tensorflow-datasets
and uninstall tensorflow
when you install tf-nightly
tf.keras.layers.....
instead of from tf.keras.layers import *
. tf.expand_dims(train_data_image, 0)
) - as a result the shape increases to 5! (1, 32, 244, 244, 5) This causes some errors which is quite hard to debug (eg: PAD op dimensions exceeded >=4). You instead want (1, 244, 244, 5) hence we use the train_preprocessed
data (check the 3rd point above) where the images don't yet have a batch dimension shape (244, 244, 3) for the representative_dataset function.next(iter(train_ds..))
. This will make the image and label as a sequential list of items and cause failures. Instead use for image, _ in train_ds_preprocessed:
Thanks! First of all, the notebook that I had provided to you was meant for reproducing the issue I was facing. Before releasing it publicly, I sure would have modified it a bit.
A couple of things:
uninstall tensorflow when you install tf-nightly
Not sure about this since when I install pip install tf-nightly
at the beginning of a Colab session (before doing anything) I have the nightly version gets reflected always. Is there anything specific for which you'd do this?
Sections can be: all imports and initial settings code
I respectfully disagree. I won't put together the pip installs inside the same code block where I am importing dependencies. I try to break longer code blocks some times which you might have seen in my notebook as well. This is my personal preference. If "all training code" seems a bit unreadable to me I'd break it into multiple cells and the same applies for "all conversion code".
If your model is for a basic tutorial and it's small, use full paths to keras APIs -- tf.keras.layers..... instead of from tf.keras.layers import *
Okay, will keep in mind. But for a bit more complex tutorials/notebooks (in general), I don't think I'd follow it.
For data generation
In the original notebook, I first loaded the dataset from tfds, visualized it (which I think is a good practice), mapped the resizing step, then mapped the scaling step and batching-shuffling (shuffling not for the validation set). The only thing I'd change is merging the resizing step and scaling step inside a utility and map them.
If you emphasized on the data generation point because I separated the steps into different cells, yes, I won't generally do that.
Representative dataset
Agreed on the point. You might have mistakenly mentioned 5 channels (244, 244, 5) but note that in the flowers' dataset the images come in 3 channels. I also see the problem in the representative_dataset_gen
utility I used:
representative_images, _ = next(iter(train_ds))
def representative_dataset_gen():
for image in representative_images:
yield [tf.expand_dims(image, 0)]
If I'd have changed it to something like the following I think it should be good.
representative_images, _ = next(iter(train_ds))
def representative_dataset_gen():
for image in representative_images:
yield [image]
I can confirm that in this way image
would have a shape of (1, 224, 224, 3)
.
You might also consider adding these instructions in the documentation.
Representative dataset - do not use next(iter(trainds..)). This will make the image and label as a sequential list of items and cause failures. Instead use for image, in train_ds_preprocessed:
Okay. But what if I'd want to restrict the number of instances in the representative dataset? Because for bigger datasets it's very difficult to have the entire training dataset streamed as the representative dataset. Would you suggest something like the following?
train_ds_unbatched = train_ds.unbatch() # train_ds already batched and preprocessed
def representative_dataset_gen():
for i, (image, _) in enumerate(train_ds_unbatched):
if i==0: # let's say I want 100 samples only
break
yield[image]
Great points! Yes, you can choose what you think works best --- eg: I learnt many new things from your tutorial (loading TFDS datasets with the [:85%]..method! who knew! :))
For the representative dataset -- Would .take()
work? https://www.tensorflow.org/datasets/overview#as_numpy (ignore the as_numpy()
part... just wanted to show an example usage)
Yes, take()
should work as well. Having a note in the documentation on handling large datasets while creating the representative dataset would help. The representative dataset generation can get non-trivial at times and here's an example (which I am sure you are already aware of).
Do you have any plan solving this? I just encounterd this issue... Here is my minimal reproduing code
import tensorflow as tf
from tensorflow import keras
import tensorflow_model_optimization as tfmot
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data() train_images, test_images = train_images / 255.0, test_images / 255.0
model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Flatten(), keras.layers.Dense(10) ])
model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))
q_aware_model = tfmot.quantization.keras.quantize_model(model) q_aware_model.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) q_aware_model.fit(train_images, train_labels, epochs=1, validation_data=(test_images, test_labels))
- Convert
```py
# Define the representative data.
def representative_data_gen():
for input_value in tf.data.Dataset.from_tensor_slices(train_images.astype("float32")).batch(1).take(100):
yield [input_value]
# Successful converting from model
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
# Successful converting from model to uint8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()
# Successful converting from q_aware_model
q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
q_converter.optimizations = [tf.lite.Optimize.DEFAULT]
q_converter.representative_dataset = representative_data_gen
q_tflite_model = q_converter.convert()
# Fail converting from q_aware_model to uint8
q_converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
q_converter.inference_input_type = tf.uint8
q_converter.inference_output_type = tf.uint8
q_tflite_model_quant = q_converter.convert()
Throws error
RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'.
tf.lite.OpsSet.TFLITE_BUILTINS_INT8
# Successful converting from model to uint8
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model_quant = converter.convert()
q_converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model) q_converter.optimizations = [tf.lite.Optimize.DEFAULT] q_converter.representative_dataset = representative_data_gen q_converter.inference_input_type = tf.uint8 q_converter.inference_output_type = tf.uint8 q_tflite_model_quant = q_converter.convert()
Throws error
```py
RuntimeError: Unsupported output type UINT8 for output tensor 'Identity' of type FLOAT32.
You can resolve the second error by using removing all the lines q_converter.inference_output_type = tf.uint8
. We're currently working on fixing this -- will post an update when it's done.
Thanks for your update. Ya, removing q_converter.inference_output_type = tf.uint8
will make it successful, but will leave output
as float32
.
interpreter = tf.lite.Interpreter(model_content=q_tflite_model_quant)
print('input: ', interpreter.get_input_details()[0]['dtype'])
# input: <class 'numpy.uint8'>
print('output: ', interpreter.get_output_details()[0]['dtype'])
# output: <class 'numpy.float32'>
what about the problem with RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'
?
I have also encounter with it when trying to convert my model. Is it because QAT not fully support full integer edge device like coral?
@MeghnaNatraj I have reconstructed a Resnet model from QAT guide https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide
This is my model and I have QAT successfully with it, but when I try to convert fully to uint8 or int8 for edge_tpu. I still got the problem: RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'
My model code:
from tensorflow import Tensor
from tensorflow.keras.layers import Input, Conv2D, ReLU, BatchNormalization,\
Add, AveragePooling2D, Flatten, Dense, concatenate
from tensorflow.keras.models import Model, Sequential
import tensorflow as tf
import tensorflow_model_optimization as tfmot
LastValueQuantizer = tfmot.quantization.keras.quantizers.LastValueQuantizer
MovingAverageQuantizer = tfmot.quantization.keras.quantizers.MovingAverageQuantizer
quantize_annotate_layer = tfmot.quantization.keras.quantize_annotate_layer
quantize_annotate_model = tfmot.quantization.keras.quantize_annotate_model
quantize_scope = tfmot.quantization.keras.quantize_scope
class DefaultBNQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
# Configure how to quantize weights.
def get_weights_and_quantizers(self, layer):
# return []
return [(layer.weights[i], LastValueQuantizer(num_bits=8, symmetric=True, narrow_range=False, per_axis=False)) for i in range(2)]
# Configure how to quantize activations.
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
# # Add this line for each item returned in `get_weights_and_quantizers`
# # , in the same order
# layer.kernel = quantize_weights[0]
# print(quantize_weights)
layer.gamma = quantize_weights[0]
layer.beta = quantize_weights[1]
# layer.moving_mean = quantize_weights[2]
# layer.moving_variance = quantize_weights[3]
# pass
def set_quantize_activations(self, layer, quantize_activations):
# Add this line for each item returned in `get_activations_and_quantizers`
# , in the same order.
pass
# Configure how to quantize outputs (may be equivalent to activations).
def get_output_quantizers(self, layer):
return []
def get_config(self):
return {}
def relu_bn(inputs: Tensor) -> Tensor:
relu = ReLU()(inputs)
bn = quantize_annotate_layer(BatchNormalization(), DefaultBNQuantizeConfig())(relu)
return bn
def residual_block(x: Tensor, downsample: bool, filters: int, kernel_size: int = 3) -> Tensor:
y = Conv2D(kernel_size=kernel_size,
strides= (1 if not downsample else 2),
filters=filters,
padding="same")(x)
y = relu_bn(y)
y = Conv2D(kernel_size=kernel_size,
strides=1,
filters=filters,
padding="same")(y)
if downsample:
x = Conv2D(kernel_size=1,
strides=2,
filters=filters,
padding="same")(x)
out = Add()([x, y])
out = relu_bn(out)
return out
def create_res_net_quantize(inputs,embedding_size,quantize=True):
quantize = True
num_filters = 64
t = quantize_annotate_layer(BatchNormalization(), DefaultBNQuantizeConfig())(inputs)
t = Conv2D(kernel_size=3,
strides=1,
filters=num_filters,
padding="same")(t)
t = relu_bn(t)
num_blocks_list = [2, 5, 5, 2]
for i in range(len(num_blocks_list)):
num_blocks = num_blocks_list[i]
for j in range(num_blocks):
t = residual_block(t, downsample=(j==0 and i!=0), filters=num_filters)
num_filters *= 2
t = AveragePooling2D(4)(t)
t = Flatten()(t)
outputs = Dense(embedding_size)(t)
model = quantize_annotate_model(Model(inputs, outputs))
with quantize_scope(
{'DefaultBNQuantizeConfig': DefaultBNQuantizeConfig}):
# Use `quantize_apply` to actually make the model quantization aware.
quant_aware_model = tfmot.quantization.keras.quantize_apply(model)
quant_aware_model.summary()
return quant_aware_model
I used the same method as @leondgarse did
Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.
Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.
As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done
Thanks @dtlam26 for the response. Strangely, it turns out that for MobileNetV1 the conversion is working fine. For MobileNetV2, as @MeghnaNatraj pointed out in earlier comment, there are two consecutive QUANTIZE and DEQUANTIZE nodes for inputs and outputs respectively which might be causing the issue. Attaching the screenshots. INPUT: OUTPUT:
Thanks @dtlam26 for the response. Strangely, it turns out that for MobileNetV1 the conversion is working fine. For MobileNetV2, as @MeghnaNatraj pointed out in earlier comment, there are two consecutive QUANTIZE and DEQUANTIZE nodes for inputs and outputs respectively which might be causing the issue. Attaching the screenshots. INPUT: OUTPUT:
Yes, you can still quantize the model for other type of quantization except int8 (I mean Builtin_int8) Furthermore, You can check out this link for the detail bypass QAT on tf2: https://github.com/tensorflow/model-optimization/issues/377#issuecomment-625093345 From this, by declaring double input to the model, that is why the QAT for mobilenetv2 got double quantize and dequantize
You can resolve the second error by using removing all the lines
q_converter.inference_output_type = tf.uint8
. We're currently working on fixing this -- will post an update when it's done.
Any news ??
Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.
As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done
@dtlam26 can you point to some resources for QAT for tf1.x and then quanitzation. I am trying the following code (without QAT) but getting some error on TF 1.15:
from tensorflow.keras import layers from tensorflow.keras.models import Sequential num_classes = 20
model = Sequential([ layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)), layers.MaxPooling2D(), layers.Conv2D(32, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Conv2D(64, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_generator, epochs=1, steps_per_epoch=100)
model.save('/tmp/temp.h5')
converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5") converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} tflite_model = converter.convert()
I am getting below error: ConverterError: See console for info. 2020-09-23 22:55:27.815650: F tensorflow/lite/toco/tooling_util.cc:1734] Array conv2d_3/Relu, which is an input to the MaxPool operator producing the output array max_pooling2d_3/MaxPool, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation. Fatal Python error: Aborted
@dtlam26 As you are not using QAT and instead using post-training quantization, you need to provide a representative_dataset
in order to quantize the model.
Modify the steps in your code as:
.
.
model.save('/tmp/temp.h5')
converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
Refer to post-training quantization for more information.
@dtlam26 As you are not using QAT and instead using post-training quantization, you need to provide a
representative_dataset
in order to quantize the model.Modify the steps in your code as:
. . model.save('/tmp/temp.h5') converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5") converter.optimizations = [tf.lite.Optimize.DEFAULT] def representative_dataset_gen(): for _ in range(num_calibration_steps): # Get sample input data as a numpy array in a method of your choosing. yield [input] converter.representative_dataset = representative_dataset_gen tflite_model = converter.convert()
Refer to post-training quantization for more information.
Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok
@dtlam26
Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok
Please can you tell me how you are able to perform QAT in tf1 ??
Is there any update on this issue? I am facing the same error while trying to convert a QAT model to INT8.
As far as my understanding, tf2.0 quantization is not supported Yet for full integer inference. Try QAT for tf1.x and everything is smoothly done
@dtlam26 can you point to some resources for QAT for tf1.x and then quanitzation. I am trying the following code (without QAT) but getting some error on TF 1.15:
from tensorflow.keras import layers from tensorflow.keras.models import Sequential num_classes = 20
model = Sequential([ layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)), layers.MaxPooling2D(), layers.Conv2D(32, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Conv2D(64, 3, padding='same', activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ]) model.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_generator, epochs=1, steps_per_epoch=100)
model.save('/tmp/temp.h5')
converter = tf.lite.TFLiteConverter.from_keras_model_file("/tmp/temp.h5") converter.inference_type = tf.lite.constants.QUANTIZED_UINT8 input_arrays = converter.get_input_arrays() converter.quantized_input_stats = {input_arrays[0] : (0., 1.)} tflite_model = converter.convert()
I am getting below error: ConverterError: See console for info. 2020-09-23 22:55:27.815650: F tensorflow/lite/toco/tooling_util.cc:1734] Array conv2d_3/Relu, which is an input to the MaxPool operator producing the output array max_pooling2d_3/MaxPool, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation. Fatal Python error: Aborted
You can check out this medium and try to create a training graph and eval graph to QAT 1.x https://medium.com/analytics-vidhya/mobile-inference-b943dc99e29b This GitHub is also a good example https://github.com/lusinlu/tensorflow_lite_guide
@dtlam26
Yes, I know this for post quantize, but my model is QAT, and it can't inference to int8 on tf2.x. For tf1 it is ok
Please can you tell me how you are able to perform QAT in tf1 ??
I have attached the source for example. However, create eval graph will forget the last layer of your model from the graph. You have to add to the graph a dummy part. Example, tf.maximum(output,1e-27) for regression problems
@dtlam26 Thanks for the resources. @Mattrix00 I also found this notebook that is working for me https://colab.research.google.com/drive/15itdlIyLmXISK6SDAzAFGUgjatfVr0Yq
@hangrymoon01, @dtlam26 thank you so much for your help !
Making as resolved due to inactivity. Feel free to reopen it if the issue persists.
UPDATE
You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version 2.4.0-rc0
onwards (and will be available in the final TF 2.4 release as well).
You will not require any workaround, i.e, you don't have to use TF 1.x
To verify that your TF version supports this, run the following code and check if runs successfully:
import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)
UPDATE
You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version
2.4.0-rc0
onwards (and will be available in the final TF 2.4 release as well).You will not require any workaround, i.e, you don't have to use TF 1.x
To verify that your TF version supports this, run the following code and check if runs successfully:
import tensorflow as tf assert tf.__version__[:2] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)
For those of you mindlessly tpying this into your terminal. It should be the following to check compatibility
import tensorflow as tf
assert tf.__version__[:3] == "2.4", 'Your TF version ({}), does not support full quantization of QAT models. Upgrade to a TF 2.4 version (2.4.0-rc0, 2.4.0-rc1...2.4) or above'.format(tf.__version__)
@msokoloff1 Thank you for that fix! I've updated all the comments above.
@MeghnaNatraj using tf 2.4.0rc2 I am still facing the issue "Quantization not yet supported for op: DEQUANTIZE". I am simply using the tf.keras.applications vgg16 network with full model quantization from the tfmot library. Do you have an example of this working with tf 2.4?
Just to make sure I wasn't doing something wrong locally, I get the exact same error if I use the notebook that you linked to above in the original post. The only change I made to the notebook was upgrading to 2.4.0rc0 and I still get the same error.
@msokoloff1 Could you post a link an end-to-end colab notebook so I can reproduce this issue?
It is exactly the same notebook that you posted at the start of this issue with the tensorflow version pinned to tf 2.4.0-rc0
@msokoloff1 @MeghnaNatraj I faced the similar isuue in tf 2.4.0-rc0. I even tried latest source for tf and tfmot ; but the issue persists. QAT with tk.keras produces quantize and dequantize layers and we are unable to convert them to full integer quantization models, even after using post training quantization on top of it?
Is there any other workarounds?
@anilsathyan7 could you also post a colab just as @msokoloff1 did before? It will help us debug this better.
There are no workarounds yet, we will get back to you as soon as we find one.
Here is a modified version of the post training quantization tf example with QAT: https://colab.research.google.com/drive/1aqK5Sd1hy1o55Y1t1MUYQLMgiBwmo2VD?usp=sharing
I used latest tf-nightly and the error was fixed.
RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'.
TensorFlow version: tf-nightly (2.5.0-dev20201130) My gist
Hey @car1hsiung , i think your resulting model is 'quantization aware' but not quantized (e.g. the weights are float32 instead of int8). Your input and outputs are still in float32 format. Can you check with the other two links ??
Your input and outputs are still in float32 form
Hi @anilsathyan7 , I test the normal and quantized models. The quantized model is invalid as you mentioned.
Please check the updated gist
RuntimeError: tensorflow/lite/kernels/quantize.cc:113 affine_quantization->scale->size == 1 was not true.Node number 0 (QUANTIZE) failed to prepare.
BTW, the converter.inference_input_type
and converter.inference_output_type
can be tf.float32 for quantized model.
For example, using float input/output, you will get:
>>> print(interpreter.get_input_details())
[{
'name': 'conv2d_input',
'index': 0,
'shape': array([1, 64, 64, 3], dtype = int32),
'shape_signature': array([-1, 64, 64, 3], dtype = int32),
'dtype': < class 'numpy.float32' > ,
'quantization': (0.0, 0),
'quantization_parameters': {
'scales': array([], dtype = float32),
'zero_points': array([], dtype = int32),
'quantized_dimension': 0
},
'sparsity_parameters': {}
}]
>>> print(interpreter.get_output_details())
[{
'name': 'Identity',
'index': 13,
'shape': array([1, 12, 12, 64], dtype = int32),
'shape_signature': array([-1, 12, 12, 64], dtype = int32),
'dtype': < class 'numpy.float32' > ,
'quantization': (0.0, 0),
'quantization_parameters': {
'scales': array([], dtype = float32),
'zero_points': array([], dtype = int32),
'quantized_dimension': 0
},
'sparsity_parameters': {}
}]
Using uint8/int8 you will get:
>>> print(interpreter.get_input_details())
[{
'name': 'conv2d_input',
'index': 0,
'shape': array([1, 64, 64, 3], dtype = int32),
'shape_signature': array([-1, 64, 64, 3], dtype = int32),
'dtype': < class 'numpy.uint8' > ,
'quantization': (3.921568847431445e-09, 127),
'quantization_parameters': {
'scales': array([3.921569e-09], dtype = float32),
'zero_points': array([127], dtype = int32),
'quantized_dimension': 0
},
'sparsity_parameters': {}
}]
>>> print(interpreter.get_output_details())
[{
'name': 'Identity',
'index': 13,
'shape': array([1, 12, 12, 64], dtype = int32),
'shape_signature': array([-1, 12, 12, 64], dtype = int32),
'dtype': < class 'numpy.uint8' > ,
'quantization': (0.0470588244497776, 128),
'quantization_parameters': {
'scales': array([0.04705882], dtype = float32),
'zero_points': array([128], dtype = int32),
'quantized_dimension': 0
},
'sparsity_parameters': {}
}]
In other words, you need to quantize input and dequantize output with scales and zero points by yourself if using uint8/int8 input and output.
I can successfully convert quantize uint8 tflite with 2.4.0-rc1 converter._experimental_new_quantizer = True
Please check the tf-2.4-rc1 gist
Converting and saving a quantized tflite with QAT using float inputs and outputs was not an issue even in tf 2.3 anyway ... The issue was with full integer quantization in QAT, so that we can use them with hardware acclerators(int).
Also if you have real train data you need to run fit/finetune on the qaware model before conversion to get proper quantized model.
Setting converter._experimental_new_quantizer = True, seems to be key here... Thanks, it worked with tf-nightly version also !!!
For QAT models, you don't need a representative dataset. Also, full integer quantization support for QAT models (full integer with (default float32)/uint8/int8 input/output) is available from TF 2.4 as shown below:
!pip uninstall -q -y tensorflow tensorflow-gpu
!pip install tensorflow==2.4
!pip install -q tensorflow-model-optimization
import tensorflow as tf
print(tf.__version__)
import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot
def get_model(is_qat=False):
(train_x, train_y) , (_, _) = tf.keras.datasets.mnist.load_data()
train_x = train_x.astype('float32') / 255
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128,activation='relu'),
tf.keras.layers.Dense(10)
])
if is_qat:
model = tfmot.quantization.keras.quantize_model(model)
model.compile(
optimizer=tf.keras.optimizers.Adam(0.001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()]
)
model.fit(train_x, train_y, batch_size=64, epochs=2, verbose=1)
return model
## 1. Normal TF Model
model = get_model()
# 1a. Convert normal TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for i in range(10):
yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
normal_tf_model_quantized_tflite_model = converter.convert()
# 1b. Convert normal TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for i in range(10):
yield [np.random.uniform(low=0.0, high=1.0, size=(1, 28, 28)).astype(np.float32)]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.representative_dataset = representative_dataset_gen
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
normal_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()
## 2. QAT (Quantize Aware Trained) TF model
qat_model = get_model(is_qat=True)
# 2a. Convert QAT TF model to INT8 quantized TFLite model (default float32 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
qat_tf_model_quantized_tflite_model = converter.convert()
# 2b. Convert QAT TF model to INT8 quantized TFLite model (uint8 input/output)
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
qat_tf_model_quantized_with_uint8_io_tflite_model = converter.convert()
@MeghnaNatraj In a real training example, should'nt we run a fit/train with a train_images_subset between steps 2 and 2.a/2.b in order to maintain accuracy as mentioned in tf doc?
@anilsathyan7 yes, you would want to train the model actually so that it can adjust to compensate for the information loss (induced for precision loss).
@anilsathyan7 @sayakpaul yes! Thanks for pointing that out. I've updated the example to also include model training :)
I have a little question with the 2.4 quantization, why do the upsampling2d is not supported in tf2 while in tf1 it is acceptable?
RuntimeError: Layer up_sampling2d_10:<class 'tensorflow.python.keras.layers.convolutional.UpSampling2D'> is not supported.
@dtlam26 I found that 8 bit qunatization of Upsampling2d is supported with latest tf-optimization source.
@dtlam26 I found that 8 bit qunatization of Upsampling2d is supported with latest tf-optimization source.
According to your source, It seems like they skip the quantization at the upsampling2d layer and only quantize the output as I suppose. This can be created if I custom quantize as well. It is just a surprise when they use to provide quantization on this
@dtlam26 I'am not sure about the internal implementation; but i was able to get past that error and convert the model to tflite with QAT(Upsample2D Resize Bilinear) and tf-nighlty. The results(accuracy) seems to be fine when i test them on sample images. Anyway they also mention this:-
There are gaps between ResizeBilinear with FakeQuant and
TFLite quantized ResizeBilinear op. It has a bit more quantization
# error than other ops in this test now.
@dtlam26 I'am not sure about the internal implementation; but i was able to get past that error and convert the model to tflite with QAT(Upsample2D Resize Bilinear) and tf-nighlty. The results(accuracy) seems to be fine when i test them on sample images. Anyway they also mention this:-
There are gaps between ResizeBilinear with FakeQuant and
TFLite quantized ResizeBilinear op. It has a bit more quantization
error than other ops in this test now.
Yes, I can bypass that if I self configure the quantization with no quantization in weights and activations as well. No need for the nightly
@dtlam26 Can you share demo code/model?
@dtlam26 Can you share demo code/model?
It is nothing much as you just need to follow the spec in the file you give me in those lines as only quantize the output, not the weights and activation. I follow tf guide for custom quantization. You can look into here
class UpSamplingQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
def get_weights_and_quantizers(self, layer):
return []
def get_activations_and_quantizers(self, layer):
return []
def set_quantize_weights(self, layer, quantize_weights):
return
def set_quantize_activations(self, layer, quantize_activations):
return
def get_output_quantizers(self, layer):
return [tfmot.quantization.keras.quantizers.MovingAverageQuantizer(
num_bits=8, per_axis=False, symmetric=False, narrow_range=False)]
def get_config(self):
return {}
@dtlam26 Can you share your minimum working example for upsample layers? I just wanted to know how the 'weights and activations' are supposed to get quantized for Upsample/ResizeBilinear Layers.
UPDATE
You can now fully quantize QAT models trained in any TF 2.x version. However, this feature is only available from TF version
2.4.0-rc0
onwards (and will be available in the final TF 2.4 release as well).You will not require any workaround, i.e, you don't have to use TF 1.x
To verify that your TF version supports this, run the following code and check if runs successfully:
ISSUE
System information TensorFlow version (use command below): 2.4.0-dev20200728
Describe the current behavior Error converting quantize aware trained tensorflow model to a fully integer quantized tflite model - error:
RuntimeError: Quantization not yet supported for op: 'DEQUANTIZE'
Describe the expected behavior Convert successfully
Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook. https://colab.research.google.com/gist/sayakpaul/8c8a1d7c94beca26d93b67d92a90d3f0/qat-bad-accuracy.ipynb