tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 323 forks source link

Support for Multiply layer #733

Open FSet89 opened 3 years ago

FSet89 commented 3 years ago

System information

Motivation

The implementation of some models (e.g. SENet) requires the use of the Multiply layer

Describe the feature

I tried to quantize a model including a Squeeze-Excitation block but I got an error: Layer multiply:<class 'tensorflow.python.keras.layers.merge.Multiply'> is not supported. You can quantize this layer by passing a tfmot.quantization.keras.QuantizeConfig instance to the quantize_annotate_layer API. It would be very useful to have this layer supported

Describe how the feature helps achieve the use case It would be possible to fully quantize models including this layer

james77777778 commented 1 year ago

It is common to use tf.keras.layers.Multiply but it lacks support from tfmot. https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_registry.py#L162

Does it has any potential risk for adding support of Multiply?

I can surpass the error and run QAT by the code below:

import tensorflow_model_optimization as tfmot
from keras import layers, models

# from
# https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_configs.py
class NoOpQuantizeConfig(tfmot.quantization.keras.QuantizeConfig):
    def get_weights_and_quantizers(self, layer):
        return []

    def get_activations_and_quantizers(self, layer):
        return []

    def set_quantize_weights(self, layer, quantize_weights):
        pass

    def set_quantize_activations(self, layer, quantize_anctivations):
        pass

    def get_output_quantizers(self, layer):
        return []

    def get_config(self):
        return {}

def apply_quant_config(layer: layers.Layer):
    if 'multiply' in layer.name:
        return tfmot.quantization.keras.quantize_annotate_layer(layer, quantize_config=NoOpQuantizeConfig())
    return layer

model = get_keras_model(...)  # user defined
annotate_model = models.clone_model(model, clone_function=apply_quant_config)
annotate_model = tfmot.quantization.keras.quantize_annotate_model(annotate_model)
with tfmot.quantization.keras.quantize_scope({'NoOpQuantizeConfig': NoOpQuantizeConfig}):
    annotate_model: models.Model = tfmot.quantization.keras.quantize_apply(annotate_model)

In my project, I can get the following result (the task is to predict image and use RMSE as evaluation metric)

Method RMSE
FP32 70
PTQ 105
QAT (non-optimized) 75

I think the result is not bad but it takes some time to figure out how to make QAT work. And I'm curious if I'm missing something...

Some links for the issue:

Thanks!