quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.16k stars 384 forks source link

result from aimet evaluation and result after quantization on 8295 doesn't match #2930

Open superpigforever opened 6 months ago

superpigforever commented 6 months ago

Hi: I tried QAT on a model and exported the encodings. Then, I used the qnn-onnx-converter with --quantization_overrides and --input_list trying to put min/max/scale value after QAT into the converted model. However, even though the evaluation of aimet model is very good, the result I got from 8295 infer is not that good. I'm not sure what is wrong.

By the way, in the json file generated by qnn-onnx-converter, there is no batchnorm even though there is batchnorm in the encoding file. The command I used is qnn-onnx-converter --input_network xxx.onnx --quantization_onverrides xxx.encodings --input_list xxx.txt

quic-akinlawo commented 6 months ago

Hello @superpigforever, batchnorms are optimized out during conversion by folding the encoding values into the preceding conv2d (including depthwise and transpose variants) or fully connected layers. As such, the missing batchnorm operation is expected.

I would recommend a layer wise comparison between the fp32 model and the QNN quantized model. That could help narrow down the source of the regression.

e-said commented 6 months ago

Hi @superpigforever,

There are two points I would recommend checking:

1/ BN folding during QAT (using the method fold_all_batch_norms) => this is recommended to ensure consistency between QAT and hardware inference. 2/ Ensure that the encodings in the cpp file generated by qnn-onnx-converter contains the encodings coming from aimet QAT.