While going through your YouTube video explanation on Quantisation. I came across this doubt when I was validating the formulas of scales and zero_point for Asymmetric and Symmetric Quantization and found mismatch in the values in Notebook code examples.
š¤Æ Observation
In the above SS of Post-Training Quantisation notebook:
the MinMaxObserver for Linea1 layer1 has calculated the min(beta) and max_value(alpha) as min_val=-53.58397674560547, max_val=34.898128509521484
Using the above min and max values, the scale and zero_point for QuantizedLinear layer1 are scale=0.6967094540596008, zero_point=77
ā Question/Doubt
Formulae for calculating s and z for Asymmetric Quantization
Considering the The default qscheme = torch.per_tensor.affine & dtype=torch.quint8 for MinMaxObserver
The Quantisation used by torch Quantisation library is Asymmetric.
Shouldn't the value for scale and zero_point for QuantizedLinear layer according to Asymmetric Quantization to 8 bit INT be:
scale=0.34698863, zero_point=100.57??
Why the scale value in the notebook ss is 2X of the scale value calculated by the formulae which is 2 * 0.34598863 ~= 0.6967094540596008,
@hkproj Can you please shed some light on the calculation?
š Description
While going through your YouTube video explanation on Quantisation. I came across this doubt when I was validating the formulas of
scales
andzero_point
for Asymmetric and Symmetric Quantization and found mismatch in the values in Notebook code examples.š¤Æ Observation
In the above SS of Post-Training Quantisation notebook: the MinMaxObserver for Linea1 layer1 has calculated the min(beta) and max_value(alpha) as
min_val=-53.58397674560547, max_val=34.898128509521484
Using the above min and max values, the scale and zero_point for QuantizedLinear layer1 are
scale=0.6967094540596008, zero_point=77
ā Question/Doubt
Formulae for calculating s and z for Asymmetric Quantization
Considering the
The default qscheme = torch.per_tensor.affine & dtype=torch.quint8 for MinMaxObserver
The Quantisation used by torch Quantisation library is Asymmetric.Shouldn't the value for scale and zero_point for QuantizedLinear layer according to Asymmetric Quantization to 8 bit INT be:
scale=0.34698863, zero_point=100.57
??Why the scale value in the notebook ss is 2X of the scale value calculated by the formulae which is 2 * 0.34598863 ~= 0.6967094540596008,
@hkproj Can you please shed some light on the calculation?
Thank you