tensorflow / model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
https://www.tensorflow.org/model_optimization
Apache License 2.0
1.49k stars 320 forks source link

Per-Tensor quantization support for Conv2D layers #438

Closed LLNLanLeN closed 4 years ago

LLNLanLeN commented 4 years ago

System information

Motivation

I've been testing TF QAT features by following the tutorials and guides on the following website:

https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide

To my understanding is that TF only have per-axis support for Conv2D layers and still working on per-tensor support. Right now, I'm working with a deployment target that requires per-tensor quantization for Conv2D, and just simply passing a CustomQuantizeConfig class to Conv2D layer and changing the weight quantizers Per-axis to False will cause errors with the TF quantize API.

Hence I'm wondering if there are any resources or additional experimental features that I can try out to perform per-tensor quantization for Conv2D layers?

nutsiepully commented 4 years ago

Hi @LLNLanLeN,

The default implementation for QAT follows the TF (default 8 bit) quantization spec.

If you want something different, you can use a custom QuantizeConfig as in the guide. However, that just ensures that you can train QAT with per-tensor quantization. For custom configurations, you have to provide the relevant kernels while executing the model.

LLNLanLeN commented 4 years ago

Hi @nutsiepully , thank you for getting back to me. I'm wondering if there are any example that can help me quantize the Conv2D weight per-tensor, instead of per-axis? The examples in the comprehensive QAT guide is only for Dense Layers, and it's not directly applicable for Conv2D layer.

I've been using this configurations for Conv2D which is called Default8BitConvQuantizeConfig, that I found here:

https://github.com/tensorflow/model-optimization/blob/fcaa2306d62a419c5bce700275748b8b08711dbc/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantize_registry.py#L486

I ended up modifying the line self.weight_quantizer = default_8bit_quantizers.Default8BitConvWeightsQuantizer() (which is per-axis by default) to per-tensor by setting the argument per-axis = False:

https://github.com/tensorflow/model-optimization/blob/fcaa2306d62a419c5bce700275748b8b08711dbc/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantizers.py

Unfortunately by simply changing that has caused a size mismatch since some of the underlying works are based on the assumption that Conv2D is per-axis quantization

nutsiepully commented 4 years ago

I ended up modifying the line self.weight_quantizer = default_8bit_quantizers.Default8BitConvWeightsQuantizer() (which is per-axis by default) to per-tensor by setting the argument per-axis = False:

This is the correct way to do it.

Unfortunately by simply changing that has caused a size mismatch since some of the underlying works are based on the assumption that Conv2D is per-axis quantization

Modifying the training to happen per-tensor should allow the training to work just fine. However, that does not guarantee conversion to TFLite. Only the default quantization spec supports conversion to TFLite. Modifications can be used to train your model against any target backend you want.

LLNLanLeN commented 4 years ago

@nutsiepully I see, thank you for responding. I've managed to per-tensor quantize Conv2D by passing in a custom configuration. Turns out that in addition to changing per_axis = False, I need to change the min_weight and max_weight shape to None as well:

https://github.com/tensorflow/model-optimization/blob/fcaa2306d62a419c5bce700275748b8b08711dbc/tensorflow_model_optimization/python/core/quantization/keras/default_8bit/default_8bit_quantizers.py

However, after that, when I convert the model to tflite, the BatchNorm layers, which normally which get folded to the Conv2D before it, now is not being Folded. Without much more information, I know it can be difficult for you to guide me, but if you have any idea where or why the folding didn't happen correctly, please point me to it.

This is tflite model using default Quantization Parameters Capture_1

This is tflite model using Conv2D per-tensor quantization parametrers (I also pass a no-op-config for Batchnorm layer here as well). As you see, the Batchnorm layer did not get folded properly Capture_2

debapriyamaji commented 4 years ago

Hi @LLNLanLeN,, If you are still stuck with the merging issue, please refer to this :https://github.com/tensorflow/model-optimization/issues/552

LLNLanLeN commented 4 years ago

@debapriyamaji Thank you for notifying me. I've found an alternative solution, but I'll still check the post

biyoml commented 4 years ago

Hi @LLNLanLeN , I have the same issue. Could you please share your solution?

LLNLanLeN commented 4 years ago

@jackjhliu hey, I recommend you try the solution posted by @debapriyamaji above (there should be a thread leading to it). And if that method works, please comment on that thread and let us know. If it still doesn't work, I can recommend another solution, it's a bit trickier to do and a require a bit more time for sure.

biyoml commented 4 years ago

@LLNLanLeN Ok, I will try. Thank your for your reply.

nutsiepully commented 4 years ago

Modifying the training to happen per-tensor should allow the training to work just fine. However, that does not guarantee conversion to TFLite. Only the default quantization spec supports conversion to TFLite. Modifications can be used to train your model against any target backend you want.

I'm afraid conversion is not guaranteed or supported for custom quantization. Conversion only works for the default quantization spec.

ai1361720220000 commented 3 years ago

Modifying the training to happen per-tensor should allow the training to work just fine. However, that does not guarantee conversion to TFLite. Only the default quantization spec supports conversion to TFLite. Modifications can be used to train your model against any target backend you want.

I'm afraid conversion is not guaranteed or supported for custom quantization. Conversion only works for the default quantization spec.

hello, does Conversion now support per layer quantization for Conv2D??

danielmimimi commented 1 year ago

@LLNLanLeN could you upload your code how you did the adjustment on the Default8BitConvWeightsQuantizer and the actual usage ?

I have trouble doing it the way you described.

Thanks

LLNLanLeN commented 1 year ago

@danielmimimi hey, I have moved away from TF framework for a while now , hence I cannot recall the issue I've come across here.