Open superpigforever opened 6 months ago
Hello @superpigforever, batchnorms are optimized out during conversion by folding the encoding values into the preceding conv2d (including depthwise and transpose variants) or fully connected layers. As such, the missing batchnorm operation is expected.
I would recommend a layer wise comparison between the fp32 model and the QNN quantized model. That could help narrow down the source of the regression.
Hi @superpigforever,
There are two points I would recommend checking:
1/ BN folding during QAT (using the method fold_all_batch_norms) => this is recommended to ensure consistency between QAT and hardware inference. 2/ Ensure that the encodings in the cpp file generated by qnn-onnx-converter contains the encodings coming from aimet QAT.
Hi: I tried QAT on a model and exported the encodings. Then, I used the qnn-onnx-converter with --quantization_overrides and --input_list trying to put min/max/scale value after QAT into the converted model. However, even though the evaluation of aimet model is very good, the result I got from 8295 infer is not that good. I'm not sure what is wrong.
By the way, in the json file generated by qnn-onnx-converter, there is no batchnorm even though there is batchnorm in the encoding file. The command I used is qnn-onnx-converter --input_network xxx.onnx --quantization_onverrides xxx.encodings --input_list xxx.txt