TypeError in quantize_dequantize when applying QAT with certain parameters

busenuraktilav commented 1 year ago

I am working on applying Quantization-Aware Training (QAT) with various parameters to optimize my model. During this process, I ran into an issue when attempting to use certain configuration parameters.

AIMET Version: 1.23

Configuration Parameters:

quant_scheme = [QuantScheme.training_range_learning_with_tf_init]
rounding_mode = "nearest"
perform_only_empirical_bias_corr = False
bitwidth = 8

Error: While running the QAT, I encountered the following error:

File "/path/to/file/quantization_wrappers.py", line 49, in apply_bias_correction bias_correction.correct_bias( File "/path/to/file/.conda/envs/aimet5/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 307, in correct_bias call_analytical_mo_correct_bias(quantize_layer, None, None) File "/path/to/file/.conda/envs/aimet5/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 176, in call_analytical_mo_correct_bias quant_dequant_weight = get_quantized_dequantized_weight(layer) File "/path/to/file/.conda/envs/aimet5/lib/python3.8/site-packages/aimet_torch/bias_correction.py", line 98, in get_quantized_dequantized_weight quant_dequant_weights = weight_quantizer.quantize_dequantize(weight_tensor, weight_quantizer.round_mode) TypeError: quantize_dequantize() missing 1 required positional argument: 'encoding_max'

The error does not occur if I set perform_only_empirical_bias_corr to True or if I apply standard quantization instead of range learning. It happens for this specific configuration.

I am unsure why this issue is occurring. Any assistance on how to address this problem or insights into what might be causing it would be greatly appreciated. Thank you in advance for your time and assistance!

quic-hitameht commented 1 year ago

Hello @busenuraktilav From the traceback, it seems you have used Bias Correction which is one of the Post-Training Quantization(PTQ) techniques and not QAT. Bias Correction supports only QuantScheme.post_training_tf and QuantScheme.post_training_tf_enhanced quantization schemes. Let us know if you have further questions.

busenuraktilav commented 1 year ago

Hello @quic-hitameht, thank you for your prompt reply and explanation. I now understand why I faced this issue, it's due to the misalignment between the PTQ technique (Bias Correction) and the QAT quantization scheme.

The AIMET documentation recommends applying PTQ techniques before proceeding with QAT. I have been following this recommendation, applying Cross Layer Equalization (CLE) and Bias Correction (BC) before starting the QAT process. This works well when I use standard QAT, with either QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced as the quantization schemes.

Now, I am considering using range learning in QAT, which requires a different set of quantization schemes. My question is, should I still apply PTQ techniques like CLE or Adaround before proceeding with QAT range learning, or is it recommended to skip PTQ when using range learning in QAT?

Thank you in advance for your guidance and clarification on this matter.

quic-hitameht commented 1 year ago

It can be beneficial to first apply PTQ before applying QAT if there is large drop in INT performance compared to FP32 baseline.

quic-mangal commented 1 year ago

@busenuraktilav, to add to what Hitarth said, you can still use CLE + Adaround with Range learning quantization schemes. Bias correction only supports post_training quantization schemes ATM

busenuraktilav commented 1 year ago

Hello @quic-mangal and @quic-hitameht, thank you for your prompt responses and valuable insights. It's clear to me now that CLE + Adaround can be used before applying QAT, while Bias Correction should be left out for this particular setup. I will adjust my experiments accordingly. Thanks again for your help!

quic-mangal commented 1 year ago

Sounds good. Also, from our experiments we have seen that Adaround performs better than Bias correction

busenuraktilav commented 1 year ago

Thank you @quic-mangal for your input regarding the relative performance of Adaround. I will definitely take this into account in my experiments.

In relation to this, I have been observing some results that I find interesting. When I run the Quantization simulation QuantizationSimModel without actually applying any quantization and evaluating the results, the INT8 accuracy is almost the same as the FP32 accuracy results, which I consider as a baseline. When I apply PTQ, the results become even better, aligning with the FP32 accuracy. However, when I apply QAT, the results get worse than the baseline accuracy. I tried this with many different parameters but the QAT results are always worse than the baseline.

I find this a bit puzzling as I would expect the QAT results to be better, or at least not worse, than the baseline. Do you have any insights as to why this might be happening? Could it be related to the configuration parameters or might there be other factors at play? Am I missing something here?

I look forward to your guidance and advice on this matter. Thank you in advance!

quic-klhsieh commented 1 year ago

Hi @busenuraktilav , when you say the QAT results get worse, can you describe the behavior of the accuracy over the course of training? Do you see a sudden drop in accuracy starting from the first epoch? And does the accuracy continue to degrade over time, or does it increase slowly/level out without being able to get back to the accuracy before QAT?

We would agree that generally speaking, QAT should have a net neutral or positive effect on a quantized model's accuracy. It's true that using appropriate parameters will give better results, and we have some guidelines listed at the bottom of this page: https://quic.github.io/aimet-pages/releases/latest/user_guide/quantization_aware_training.html#ug-quantization-aware-training

Could you also clarify your statement on running quantization simulation without actually applying any quantization? If you have instantiated QuantSim and gone through compute_encodings, quantization will already be occurring in the model unless you have manually changed quantizer settings. Did you mean to say 'without actually applying any PTQ techniques'?

quic / aimet

TypeError in quantize_dequantize when applying QAT with certain parameters #2242