Others: Doubt regarding the scales and zero value calculate for each layer in PTQ and QAT

📝 Description

While going through your YouTube video explanation on Quantisation. I came across this doubt when I was validating the formulas of scales and zero_point for Asymmetric and Symmetric Quantization and found mismatch in the values in Notebook code examples.

🤯 Observation

In the above SS of Post-Training Quantisation notebook: the MinMaxObserver for Linea1 layer1 has calculated the min(beta) and max_value(alpha) as min_val=-53.58397674560547, max_val=34.898128509521484

Using the above min and max values, the scale and zero_point for QuantizedLinear layer1 are scale=0.6967094540596008, zero_point=77

❔ Question/Doubt

Formulae for calculating s and z for Asymmetric Quantization

Scale = (Xmax - Xmin)/ (2^n - 1)
Zero_point = -1 * (Xmin/Scale)

Considering the The default qscheme = torch.per_tensor.affine & dtype=torch.quint8 for MinMaxObserver The Quantisation used by torch Quantisation library is Asymmetric.

Shouldn't the value for scale and zero_point for QuantizedLinear layer according to Asymmetric Quantization to 8 bit INT be: scale=0.34698863, zero_point=100.57??

Why the scale value in the notebook ss is 2X of the scale value calculated by the formulae which is 2 * 0.34598863 ~= 0.6967094540596008,

@hkproj Can you please shed some light on the calculation?

Thank you

hkproj / quantization-notes