quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.14k stars 382 forks source link

AIMET vs SNPE quantization #2854

Open Piotr94 opened 7 months ago

Piotr94 commented 7 months ago

I would like to ask some questions about the difference between AIMET and SNPE quantization.

I am attempting to perform quantization of the video denoising model.

I started with SNPE and used the following commands:

snpe-onnx-to-dlc -i MODEL_NAME.onnx -o MODEL_NAME2.dlc
snpe-dlc-quantize --input_dlc MODEL_NAME.dlc --input_list Inputlist.txt --use_enhanced_quantizer --use_adjusted_weights_quantizer --axis_quant --output_dlc Quant_MODEL_NAME.dlc  --enable_htp --htp_socs sm8550

For calibration, I used 5 inputs with different levels of noise. Then I tested the quantized model on exactly the same inputs using the command below: snpe-net-run --container Quant_MODEL_NAME.dlc --input_list Inputlist.txt The quality is lower, but the difference is acceptable. On average, there is a drop from 37.77 dB to 37.17.

To have more control over quantization, I wanted to use AIMET. I used the code below to obtain the quantized model:

import onnx
from aimet_onnx.batch_norm_fold import fold_all_batch_norms_to_weight
from aimet_onnx.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
import numpy as np
onnx_model = onnx.load("MODEL_NAME.onnx")
input_shape = (1, 13, 320, 320)
dummy_data = np.random.randn(*input_shape).astype(np.float32)
dummy_input = {'0' : dummy_data}
_ = fold_all_batch_norms_to_weight(onnx_model)
sim = QuantizationSimModel(onnx_model, quant_scheme=QuantScheme.post_training_tf_enhanced,
                           rounding_mode='nearest', default_param_bw=8, default_activation_bw=8,
                           use_symmetric_encodings=False, use_cuda=True)

Then, for calibration, I used the same 5 inputs as for SNPE:

def pass_calibration_data(session, args):
    eval_dataset = LabeledDatasetWrapper()
    for input_data, _ in eval_dataset:
        input_dict = {'0' : input_data[None, :]}
        print(input_dict['0'].shape)
        session.run(None, input_dict)
sim.compute_encodings(pass_calibration_data, None)

Later, I evaluated the quantized model on the same 5 inputs, but the obtained results were very poor, with an average PSNR of 25 dB. For evaluation, I used the following code:

eval_dataset = LabeledDatasetWrapper()
for n, (input_data, clean) in enumerate(eval_dataset):
    input_dict = {'0' : input_data[None, :]}
    print(input_dict['0'].shape)
    outputs = sim.session.run(None, input_dict)
    output_name = sim.session.get_outputs()[0].name
    print(psnr(torch.tensor(outputs[0]), torch.tensor(clean)))

I presume that 5 inputs are not enough, but the difference between the obtained results is very confusing for me. Could you tell me what I can do to obtain the same results with AIMET as with SNPE quantization? Are there any mistakes in the current AIMET approach?

quic-mangal commented 6 months ago

@Piotr94, AIMET does not support adjusted_weights_quantizer, could you disable it SNPE as well. Also, are you using Per channel quantization in AIMET, since it is being used with SNPE in your code?

I presume that 5 inputs are not enough, but the difference between the obtained results is very confusing for me.

Yes, 5 is a very small dataset.

Piotr94 commented 6 months ago

Thanks for your answer. Indeed i didn't use per channel quantization in AIMET but I don't see how could I set it. In the documentation for QuantSimModel (link) I couldn't find such option. Can you tell where can I find it/ how can I enable it?

quic-mangal commented 6 months ago

Yes, looks like it is not documented. For per channel you can an example here- https://github.com/quic/aimet/blob/develop/NightlyTests/torch/test_quantize_resnet18.py

You need to change the config file that is passed to Quantsim as show in- save_config_file_for_per_channel_quantization