How to config TensorRT network in order to quantize convolution layer weights in a per-tensor mode?

un-knight commented 3 years ago

TensorRT supports two ways to build a quantization engine:

PTQ with calibration or dynamic range API defined by the user manually
Explicit quantization with Q/DQ ONNX model

The NNI speedup quantized model by TensorRT PTQ dynamic range API. But it seems that TensorRT PTQ dynamic range API will quantize activation in a per-tensor way while quantizing convolution weights in a per-channel way by default. The NNI QAT module quantizes both activation and weights in a per-tensor way, which is different from TensorRT PTQ.

Are there any methods to config TensorRT to quantize weights in a per-tensor mode?

linbinskn commented 3 years ago

It seems that TensorRT hasn't exposed API for setting per-channel or per-tensor mode right now.

un-knight commented 3 years ago

It seems that TensorRT hasn't exposed API for setting per-channel or per-tensor mode right now.

yep, tensorrt only supports per-channel weight quantization under PTQ mode.

microsoft / nni

How to config TensorRT network in order to quantize convolution layer weights in a per-tensor mode? #4018