microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

New restricted asymmetric quantization mode in QDQ mode with zero_point restricted to either 128 or 0 #21398

Open ulfhanebutte opened 3 months ago

ulfhanebutte commented 3 months ago

Describe the feature request

QDQ process includes symmetric quantization and asymmetric quantization by introducing the zero-offset. Many accelerators do not support zero-offset and thus symmetric quantization is need, which is not idea for tensors that are strictly positive, e.g. an output tensor after RELU activation function. The requested feature is to allow tensors to be int8 or uint8 and use the uint8 for tensors that are strictly positive. This is equivalent to uint8 with either zero_point 128 or 0.

Describe scenario use case

An example is a tensor after the RELU or Sigmoid activation function. Both function guarantee that the tensor values all are positive. The new restricted asymmetric quantization mode would provide an zero_point of 0 for the tensor stored in uint8 and all tensors that have negative and positive values would be represented with uint8 and zero_point offset of 128. As this new mode restrict to only these two cases, an accelerator HW that supports int8 and uint8 tensors can use this new restricted asymmetric quantization mode.

anna244 commented 2 months ago

hello, ran into a problem that onnxruntime c QDQ format on openvino executer does not work asymmetric quantization. Can you please elaborate on which accelerators do not support asymmetric quantization. And what do you mean by accelerators in your contex? Maybe you know openvino for QDQ format really does not support symmetric quantization? And why?