quic / aimet

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
https://quic.github.io/aimet-pages/index.html
Other
2.09k stars 374 forks source link

AutoQuant Eval accuracy is 0 #2601

Open sandeep1404 opened 9 months ago

sandeep1404 commented 9 months ago

Hi all, I was trying to use ResNet50 model for mnist digit classification, after training the model i was trying to perform Autoquant operation on the model it is always giving eval accuracy as 0, I am not sure where i went wrong and cannot able to debug the problem, can someone help in resolving this issue. Thank you in Advance.

quic-hitameht commented 9 months ago

Hello @sandeep1404 Could you please share minimal script to reproduce the issue?

sandeep1404 commented 9 months ago

Hi @quic-hitameht Thanks for the reply, PFA of my notebook file for your kind reference. ptq_test_aimet-resnet50_mnist_final.pdf

sandeep1404 commented 8 months ago

Hi, I was trying to do mnist classification using the VGG model, and when was trying to perform AutoQuant on the model, I was getting an accuracy of 0.09 i.e. 9%. I don't understand what's happening inside the autoquant, it is trying to perform CLE,adaround, and batch norm folding. When I was trying to give mnist test dataset for the calibration data, I was getting an accuracy of less than 10%, why is there a huge drop even with 32-bit floating point weights and activation? When I try to print the predictions, almost all the predictions are the same why is this happening?PFA of the code file. Can you please let me know where I am going wrong? I am following the below documentation https://quic.github.io/aimet-pages/releases/latest/api_docs/tensorflow_auto_quant.html .

  1. Also, how could I verify whether my weights of the model are quantized to int8 or not?
  2. How to get the quantized model information.
  3. Why is it giving the W32A32, i.e. 32-bit floating point accuracy why is it not giving me the quantized accuracy? and which quantized model has the best accuracy; how to determine that? whether it's per tensor symmetric/asymmetric or per channel symmetric/asymmetric?
  4. Why the floating point accuracy is same as autoquant accuracy, as shown in the below pictures? please, someone, kindly help me I have been stuck with this issue for a long time. autoquant_mnist_vgg (3).pdf

image

This picture shows the accuracy obtained after performing batchnorm folding and autoquant operations, but still, the accuracy is the same as floating point accuracy, so there is some mistake at my end that I cannot figure out can someone kindly check it and let me know where I am going wrong.

image

sandeep1404 commented 8 months ago

Hi Team, Any update on this issue? Can someone look it and tell me whats going wrong.

quic-hitameht commented 8 months ago

hello @sandeep1404 sorry for delayed response.

It seems there is an issue with your eval_callback function. Before you run AutoQuant.apply(...), you should first simply evaluate model with your eval_callback function to get W32A32 (or FP32) accuracy and that should be same as when you run model.evaluate which gives you Test accuracy: 0.991599977016449 to begin with. This will help us to narrow down the issue. Once this is fixed, we can go ahead and apply AutoQuant API.

AutoQuant internally applies BN fold, Cross-layer equalization and Adaround in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.

Please find following responses to your questions.

  1. After applying above PTQ techniques, AutoQuant will use Quantization Simulation functionality to simulate the effects of quantized hardware. Then it will generate an encodings JSON file containing quantization scale/offset parameters for each activation and weight tensor of your model which then will be consumed by the hardware along with the model. So, you wouldn't get model with int8 weights after AutoQuant API instead you will have a quantization encodings JSON file with quantization information.
  2. Generated encodings JSON file has model quantization information.
  3. Since, FP32 accuracy using eval_callback function is incorrectly calculated, the target accuracy is also incorrect and negative (0.0924 - 0.15) as allowed_accuracy_drop is 0.15. Due to this, only BN fold technique is applied and the exit conditions are met that is the reason why your "quantized accuracy" is same as FP32 accuracy! Furthermore, you can specify per-tensor vs per-channel quantization configurations via config file when invoking AutoQuant object. Please refer to API doc for more details.
  4. As I said earlier, your FP32 accuracy is same as final quantized accuracy because firstly your FP32 accuracy is incorrect (0.0924) w/ passed eval_callback function and thus only BN fold technique is applied and has same accuracy as FP32 accuracy because BN folded model is mathematically equivalent to a model w/ BN layers from an inference perspective.

Hope this helps. Please let us know if you have further questions.