quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.09k stars 374 forks source link

Resnet50 aimet quntise(int8) dlc model is giving high accurcey than aimet int8 model and it is matching to fp32 results #1626

Open bommineniravali opened 1 year ago

bommineniravali commented 1 year ago

Hi, It is pytorch resnet50 predefined fp32 model I have taken aimet_resnet50_int8 onnx model (sim.export api) and converted into dlc and validating dlc model with imagenet dataset on target. but i am getting high Accuracy than aimetint8 results. I have used below Techniques on aimetside ->quantization simulation (tf_enhanced) ->CLE ->CLE + addaround +quantsim

above I have tried still all methods of dlc validation results are matching with fp32 results and it is not matching aimet int8 results. why i am getting high Accuracy with int8 on target cpu than aimet int8 results? After quantization model is not reduced and it remains same as fp32 model and why model size is not decreasing after quantization?

Can please suggest on these issues.

quic-akhobare commented 1 year ago

Hi @bommineniravali

above I have tried still all methods of dlc validation results are matching with fp32 results and it is not matching aimet int8 results. why i am getting high Accuracy with int8 on target cpu than aimet int8 results?

From your description it appears that you are using SNPE or QNN sdk. What backend did you choose to run? Note that the ARM CPU backend will not run the model quantized and you should see close to FP32 accuracy.

After quantization model is not reduced and it remains same as fp32 model and why model size is not decreasing after quantization?

AIMET does not quantize the model. AIMET optimizes the model for quantization so that the model accuracy will be improved when subsequently it is run on a quantized target. So, resulting model from AIMET is still FP32 and you will not apparently see a model size reduction. But when you take the model to a target device, say using the Qualcomm Neural Processing SDK, you should see smaller models when run on target devices and much faster inference.