Open mciprian13 opened 3 years ago
@jfix71 Do the above problems make sense? Do I have your blessings to solve the mentioned problems?
@mciprian13 I think we intended to allow some numerical changes, because they can improve performance, under the assumption that the model could tolerate slight changes in numerics. We have a cctx flag enableQuantParamChanges
which is sporadically used to disable some optimizations that would change numerics here. I wonder if it would make sense to just skip certain optimizations using a flag like this (or other/new flag(s)) when we run the optimizer? Then the user has control over this precision/performance tradeoff.
@jfix71 I was not aware of this flag. Thanks for pointing it out. I will think about using it. Thanks!
The graph optimizations should not change the numerical behavior of a graph before/after transformation. Some graph optimizations for quantized operations change the numerical behavior:
RescaleQuantized
nodes together is not mathematically equivalent with the fusedRescaleQuantized
because eachRescaleQuantized
changes the dynamic range of the data and hence saturates the data between different limits. The subsequent saturation between different limits is not equivalent with saturating the data once using the limits of the finalRescaleQuantized
but rather should use the saturating limits given by minimum/maximum equivalent range. E.g. having 3 subsequent RescaleQuantized nodes with the min/max output limits: [1,3], [0,4], [2,4], Glow currently uses the type of the last node (e.g. saturating using the limits [2,4]) but this is not correct. The correct saturation limits should be [2,3] (the intersection of all the ranges).RescaleQuantized
into aDequantize
node input should only be done if the input range of theRescaleQuantized
is smaller than (is included by) the range of theDequantize
input.Dequantize
followed by aQuantize
into a NOP looses precision and should only be done during quantization (where the precision loss is intentional) and not during generic graph optimizationsGraph transformations involving the
RescaleQuantized
node should be done more rigorous. TensorFlowLite for example uses the rescale node to implement quantized Relu/Clip activation:RescaleQuantized
node while fusing it with others (otherRescaleQuantized
orDequantize
) such activations are lost during graph optimizationsRescaleQuantized
node which should be preservedThe mentioned problems were discovered while working on #5723.