Open umeannthtome opened 7 years ago
There is a range estimation function which is done after every frame of inference. The quantization is not only dependent on the current input - it also depends on the estimated range (for various blobs) from the previous runs.
The difference between cpu mode and gpu mode should be investigated - unfortunately I am not able spend time on this due to another project - if you make any progress please let us know.
Thanks, Manu.
I noticed the update of data range too. But I have not figured how did CPU and GPU compute differently.
I also suggest to increase quantization_start_iter to a higher value (say 5 or 10) for better stability.
Hi,
Is there any random number involved in the quantization process except the random value in stochastic rounding scheme?
I ask because I noticed that if I turn on quantization and feed a batch of 8 exact same testing images into my network, the semantic segmentation output turn out to be different for all 8 of them.
William
Edit 1: The output, although still varies, becomes more stable when power2_range is set to true. Edit 2: The outputs of cpu mode and gpu mode are also different from one another. Edit 3: Output becomes more stable when larger bitwidth is used. Edit 4: Quantization actually starts after the first image during inference when Quantization_Start is set to 1