microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.98k stars 1.81k forks source link

How to config TensorRT network in order to quantize convolution layer weights in a per-tensor mode? #4018

Closed un-knight closed 3 years ago

un-knight commented 3 years ago

TensorRT supports two ways to build a quantization engine:

The NNI speedup quantized model by TensorRT PTQ dynamic range API. But it seems that TensorRT PTQ dynamic range API will quantize activation in a per-tensor way while quantizing convolution weights in a per-channel way by default. The NNI QAT module quantizes both activation and weights in a per-tensor way, which is different from TensorRT PTQ.

Are there any methods to config TensorRT to quantize weights in a per-tensor mode?

linbinskn commented 3 years ago

It seems that TensorRT hasn't exposed API for setting per-channel or per-tensor mode right now.

un-knight commented 3 years ago

It seems that TensorRT hasn't exposed API for setting per-channel or per-tensor mode right now.

yep, tensorrt only supports per-channel weight quantization under PTQ mode.