Open nainag opened 2 years ago
I don't think NVDLA supports low-precision quantized network. Even the 8-bit (normal quantized) networks are compiled with its own compiler. Maybe, you can achieve the pseudo low-precision, i.e. 4-bit written on 8-bit data, by providing calibration table. However, I didn't try anything like that. This idea might face issues with working model, as some models won't be implemented by NVDLA.
Hi,
Has anyone tried deploying a low-precision quantized network (int4, int5, etc.) on NVDLA?
If so, please let me know the steps and if you are able to successfully generate the calibration table using TensorRT and does the hardware supports quantization?
I would really appreciate any help in this direction.
Thanks!