Open xiexiaozheng opened 1 year ago
Hello @xiexiaozheng, per channel quantization is recommended for convolution-like operations to increase the resolution.
Hello @xiexiaozheng, per channel quantization is recommended for convolution-like operations to increase the resolution.
@quic-akinlawo But the accuracy of QAT model is ok, the accuracy decrease when the model deploy to DSP
Is there any progress on this matter? I am very curious as to what led to this result.
@1SingleFeng hi~,I encountered the same issue. Have you found the cause of the problem?
aimet version: 1.28 SNPE version: 2.14 deploy platform: SM8550 DSP w8 a8 bias32
I have a model, and the backbone of this model is MobileNetV3. As you know, MobileNetV3 primarily consists of pointwise and depthwise convolutions. I used AIMET 1.28 to implement Quantization-Aware Training (QAT) on this model, and the QAT model achieved the same accuracy as the FP32 model.The QAT configuration is as follows:
weight bitwidth = 8, activation bitwidth = 8
I exported the model in ONNX format along with a statistics file and converted it to DLC. When testing on the DSP (Digital Signal Processor), I observed a significant drop in accuracy.
To identify which layer of the model is causing this issue, my debugging approach is as follows: I divided the entire network into two parts, the first part runs on the DSP, and the second part runs on the QAT model generated by QuantizationSimModel with input from DSP. After conducting these tests, I noticed the following phenomenon: every time the model passes through a depthwise convolution, the accuracy decreases (initially with fewer channels, the drop is not significant). This is especially noticeable in one specific depthwise convolution layer with 672 channels, where the accuracy drops by half. Is this because I am using per-tensor quantization? Interestingly, my QAT model doesn't suffer from accuracy degradation; the significant drop in accuracy only occurs when deployed on the DSP. Additionally, each layer of depthwise convolution is followed by the activation function hardswish, and I added ["Conv", "HardSwish"] in the supergroups section of the configuration.
The quantization statistics for the weights of the depthwise convolution and the quantization statistics for the output of the hardswish activation both come from the encoding file exported from AIMET, while the quantization statistics for bias come from command tool of snpe-dlc-quant.
The structure was shown as below.