Integration with TFLite

jasonravagli commented 1 year ago

Issue Type

Documentation

Source

pip (model-compression-toolkit)

MCT Version

1.7.1

OS Platform and Distribution

No response

Python version

No response

Describe the issue

As mentioned in #273, the MCT quantization produces fakely quantized models with float32 weights and currently does not support models with int8 weights. However, from the code and the documentation it is not clear to me how MCT integrates with TFLite to produce models with integer weights for deployment on edge devices.

The TFLite full-integer quantization of a model with float weights requires a PTQ process with a representative dataset. Will this process ruin the MCT quantization? Are there any other methods to convert an MCT quantized Keras model to a TFLite model with integer weights?

Expected behaviour

No response

Code to reproduce the issue

import tensorflow as tf
import model_compression_toolkit as mct

def get_tpc() -> mct.target_platform.TargetPlatformCapabilities:
    tp = mct.target_platform
    default_config = tp.OpQuantizationConfig(
        activation_quantization_method=tp.QuantizationMethod.POWER_OF_TWO,
        weights_quantization_method=tp.QuantizationMethod.POWER_OF_TWO,
        activation_n_bits=8,
        weights_n_bits=8,
        weights_per_channel_threshold=True,
        enable_weights_quantization=True,
        enable_activation_quantization=True,
        quantization_preserving=False,
        fixed_scale=1.0,
        fixed_zero_point=0,
        weights_multiplier_nbits=0,
    )

    default_configuration_options = tp.QuantizationConfigOptions([default_config])
    tp_model = tp.TargetPlatformModel(default_configuration_options)

    tpc = tp.TargetPlatformCapabilities(tp_model)

    return tpc

def get_representative_dataset(dataset):
    def representative_dataset():
        for i in range(len(dataset)):
            x, _ = dataset[i]
            yield [x]

    return representative_dataset

model = ... # load the model
dataset = ... # load the dataset

representative_dataset = get_representative_dataset(dataset)

qat_model, _, _ = mct.keras_quantization_aware_training_init(
    model,
    representative_dataset,
    core_config=mct.CoreConfig(),
    target_platform_capabilities=get_tpc(),
)

# ...
# Perform QAT using the keras.fit()
# ...

quantized_model = mct.keras_quantization_aware_training_finalize(qat_model)

# Convert the model to TFLite and save it
converter = tf.lite.TFLiteConverter.from_keras_model(quantized_model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
tflite_model = converter.convert() # Error: a representative dataset must be specified

Log output

No response

reuvenperetz commented 1 year ago

Hello @jasonravagli, Currently, we only support exporting TensorFlow models using fake quantization weights. We do not recommend using TFLite optimizations as it may change how MCT quantizes the weights. Feel free to let us know if you have further questions or concerns.

jasonravagli commented 1 year ago

Thank you for your quick response. If I get what you are saying, there is no direct way to convert a TensorFlow quantized model generated by MCT to a TFLite model. MCT already has very interesting functionalities, and adding this one would facilitate its use in the deployment of models on edge devices.

Thank you again for your time.

reuvenperetz commented 1 year ago

Hi @jasonravagli, A new method for exporting TFLite int8 models from MCT has recently been added and will be available in the upcoming release. Please keep in mind that this is an experimental feature and is subject to future changes.

You can find more information and usage example here. If you have any questions or issues, please let us know.

sony / model_optimization