Open liziru opened 3 years ago
The max-min is essentially finding the maximum absolute value. In the code, I allow a small saturating in max-min is sometimes the value can be 1.000001 which > 1 so result in lower resolution.
No matter to use the max-min or the kld (saturated), the c code is running in saturated mode, so small saturation will be fine. But you will need to pay attention to saturate the input data as well.
And we don't have Z number here because this lib only working in symmetric quantisation currently. The S we quantised is restrict at the power of 2, so there is only 1, 2, 4, 8 ...
The max-min is essentially finding the maximum absolute value. In the code, I allow a small saturating in max-min is sometimes the value can be 1.000001 which > 1 so result in lower resolution.
No matter to use the max-min or the kld (saturated), the c code is running in saturated mode, so small saturation will be fine. But you will need to pay attention to saturate the input data as well.
And we don't have Z number here because this lib only working in symmetric quantisation currently. The S we quantised is restrict at the power of 2, so there is only 1, 2, 4, 8 ...
Thanks for your reply.
The main goal of quantization is to estimate 'S', and convert float results into integer results, which is described in the following picture by multiplying the 'S':
I checked the 'weights.h' dumped from function 'generate_model'. What's the function of the following tagged macros?
Or what is your quantitative formula? What's the difference between quantization above and yours?
Looking forward to your reply.
Details of the quantisation used here can be found in ARM's blog/post about cmsis-nn and post from TFLite micro. The difference between this one and the equation you showed are:
Details of the quantisation used here can be found in ARM's blog/post about cmsis-nn and post from TFLite micro. The difference between this one and the equation you showed are:
- we don't use Z, or Z=0
- The S is limited to powers of 2 because we can make shifting very quickly in MCU unlike dividing by a random S number.
- The decimal bit present the real data is the data divided by the pow(2, dec_bit)
In my case, S is small( for example, 1/256) because float data range is small.
I am confused about the last words yor referred .
The decimal bit present the real data is the data divided by the pow(2, dec_bit)
Do you attain quantization integer data by 'R/pow(2, dec_bit)' (R is original float data)? If so, your final quantization result will get smaller in this computation and this computation seems wrong. I don't know whether my understanding is right.
integer/S = float, where S=pow(2, N).
for example, if n=7
, integer=64 then it represent 64/128 = 0.5
Got it. Thank you very much.
I check function 'generate_model' carefully, but still confused about 'min_max' quantization approach. I also look up the related documents and do not find the answers. For example, why 'find_dec_bits_max_min' function do that?
![image](https://user-images.githubusercontent.com/34911790/121516879-4da01380-ca21-11eb-842e-da5f84527f13.png)
What I know about 'min-max' quantization is realized as following picture:![image](https://user-images.githubusercontent.com/34911790/121516528-e4200500-ca20-11eb-9657-91660667d665.png)
Can you give more information about the quantization? I will appreciate it very much.
Looking forward to your reply.