[Quantization] Graph optimizations are not correct

mciprian13 commented 3 years ago

The graph optimizations should not change the numerical behavior of a graph before/after transformation. Some graph optimizations for quantized operations change the numerical behavior:

Merging multiple RescaleQuantized nodes together is not mathematically equivalent with the fused RescaleQuantized because each RescaleQuantized changes the dynamic range of the data and hence saturates the data between different limits. The subsequent saturation between different limits is not equivalent with saturating the data once using the limits of the final RescaleQuantized but rather should use the saturating limits given by minimum/maximum equivalent range. E.g. having 3 subsequent RescaleQuantized nodes with the min/max output limits: [1,3], [0,4], [2,4], Glow currently uses the type of the last node (e.g. saturating using the limits [2,4]) but this is not correct. The correct saturation limits should be [2,3] (the intersection of all the ranges).
Merging a RescaleQuantized into a Dequantize node input should only be done if the input range of the RescaleQuantized is smaller than (is included by) the range of the Dequantize input.
Merging a Dequantize followed by a Quantize into a NOP looses precision and should only be done during quantization (where the precision loss is intentional) and not during generic graph optimizations

Graph transformations involving the RescaleQuantized node should be done more rigorous. TensorFlowLite for example uses the rescale node to implement quantized Relu/Clip activation:

but because Glow does not handle properly the RescaleQuantized node while fusing it with others (other RescaleQuantized or Dequantize) such activations are lost during graph optimizations
in other words Glow does not handle properly the saturation behavior of the RescaleQuantized node which should be preserved

The mentioned problems were discovered while working on #5723.

mciprian13 commented 3 years ago

@jfix71 Do the above problems make sense? Do I have your blessings to solve the mentioned problems?

jfix71 commented 3 years ago

@mciprian13 I think we intended to allow some numerical changes, because they can improve performance, under the assumption that the model could tolerate slight changes in numerics. We have a cctx flag enableQuantParamChanges which is sporadically used to disable some optimizations that would change numerics here. I wonder if it would make sense to just skip certain optimizations using a flag like this (or other/new flag(s)) when we run the optimizer? Then the user has control over this precision/performance tradeoff.

mciprian13 commented 3 years ago

@jfix71 I was not aware of this flag. Thanks for pointing it out. I will think about using it. Thanks!

pytorch / glow

[Quantization] Graph optimizations are not correct #5729