quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.08k stars 373 forks source link

aimet qnn quantization overwrite #2139

Closed hamzeasadi closed 1 year ago

hamzeasadi commented 1 year ago

Hello all

Aimet is used for quantization and aimet encoding will be used for QNN. Aimet will not provide any quantization encoding for Conv_bias if bias flag in convolution is False, while Qnn will calculate bias encoding, resulting in different output. Could you please let me know if there is a fix for this problem?

I also notice something regarding channelprune technique for compression. If the zero padding is defined inside the convolution layer the compression is ok but it it is defined as a separate layer it will give the following error raise AssertionError("Layer currently supports only Conv2D and Linear") AssertionError: Layer currently supports only Conv2D and Linear

quic-mangal commented 1 year ago

@quic-klhsieh could you please help answer this question.

quic-klhsieh commented 1 year ago

@hamzeasadi , to clarify, are you describing a case where a Conv layer has been defined with bias=False, such that the original model does not contain a bias parameter to begin with? In this case, AIMET will not find any parameter to quantize, and will not be able to insert a quantizer for it. In order for AIMET to see a bias, you will need to modify your original model definition to include a bias.

If there is a bias in the original model, AIMET will insert a quantizer for it. However, the default configuration file which Quantsim uses has bias quantization disabled, meaning that bias quantizers will act as passthrough ops, since that is typically what we see for most runtimes. If you are intending for bias to be quantized in your runtime, you can remove the disable setting for biases in the configuration file used with Quantsim. It's worth noting that runtimes also may have a setting for enabling or disabling bias quantization.

For more information on how to use the config file with Quantsim, please refer to the "Configuring Quantization Simulation Ops" section at https://quic.github.io/aimet-pages/releases/latest/user_guide/quantization_sim.html, as well as https://quic.github.io/aimet-pages/releases/latest/user_guide/quantization_configuration.html#ug-quantsim-config

The default configuration file used by Quantsim can be found here: https://github.com/quic/aimet/blob/develop/TrainingExtensions/common/src/python/aimet_common/quantsim_config/default_config.json

and the bias quantization is currently disabled due to the presence of the "is_quantized": False flag for "bias" under the "params" section.

Thanks for informing us about the issue you are seeing with zero padding used with channel pruning. Currently our focus is on developing our techniques for quantization, and improvements/fixes for compression techniques are deprioritized. We'll keep this issue mind for the future.

hamzeasadi commented 1 year ago

Thank you very much for clarifying. Following the same procedure, my results are satisfactory. The main question I have is why QNN computes quantization encoding for bias of a convolution layer when it is disabled? Whenever bias is disabled, there are no parameters, however, in the QNN generated encoding file there is a quantization for bias of convolution with bias=False.

quic-akhobare commented 1 year ago

Thank you very much for clarifying. Following the same procedure, my results are satisfactory. The main question I have is why QNN computes quantization encoding for bias of a convolution layer when it is disabled? Whenever bias is disabled, there are no parameters, however, in the QNN generated encoding file there is a quantization for bias of convolution with bias=False.

Hi @hamzeasadi - please reach out to the QNN team separately to understand the details there. However, what I do know is that via a command line flag you can specify 32-bit bias for conv layers when running in QNN. Which is equivalent to disabling quant-dequant for it in AIMET since 32-bit is high-enough resolution to not add any perceptible quantization noise.

hamzeasadi commented 1 year ago

Thank you so much for your response.

hamzeasadi commented 1 year ago

I will close the issue since the aimet part is resolved.