Hi,
I have trace your code, but not find execution of quantize weight.
In main_cls.py, set option such --quantize, --equalize is enable with mobilenetv2. When program is done two function of cross_layer_equalization() and set_quant_minmax(), observe the forward of this function QuantNConv2d during inference, self.quant(input) is doing like fake-quantization, result is float, Then just run conv2d(), there is no quantization of weight in the process.
So my question is is there no quantization weight in this code?
quantization is done in Here before inference start.
It's faster than doing quantization at each iteration, although there might be small numerical difference.
Hi, I have trace your code, but not find execution of quantize weight.
In main_cls.py, set option such --quantize, --equalize is enable with mobilenetv2. When program is done two function of cross_layer_equalization() and set_quant_minmax(), observe the forward of this function QuantNConv2d during inference, self.quant(input) is doing like fake-quantization, result is float, Then just run conv2d(), there is no quantization of weight in the process.
So my question is is there no quantization weight in this code?
quantize.py :