New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0

Describe the feature request

QDQ process includes symmetric quantization and asymmetric quantization by introducing the zero-offset. Many accelerators do not support zero-offset and thus symmetric quantization is need, which is not idea for tensors that are strictly positive, e.g. an output tensor after RELU activation function. The requested feature is to allow tensors to be int8 or uint8 and use the uint8 for tensors that are strictly positive. This is equivalent to uint8 with either zero_point 128 or 0.

Describe scenario use case

An example is a tensor after the RELU or Sigmoid activation function. Both function guarantee that the tensor values all are positive. The new restricted asymmetric quantization mode would provide an zero_point of 0 for the tensor stored in uint8 and all tensors that have negative and positive values would be represented with uint8 and zero_point offset of 128. As this new mode restrict to only these two cases, an accelerator HW that supports int8 and uint8 tensors can use this new restricted asymmetric quantization mode.

microsoft / onnxruntime

New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0 #21398

Describe the feature request

Describe scenario use case