Deploying a quantized network on NVDLA

nvdla / hw

RTL, Cmodel, and testbench for NVDLA

Other

1.67k stars 561 forks source link

Deploying a quantized network on NVDLA #355

Open nainag opened 2 years ago

nainag commented 2 years ago

Hi,

Has anyone tried deploying a low-precision quantized network (int4, int5, etc.) on NVDLA?

If so, please let me know the steps and if you are able to successfully generate the calibration table using TensorRT and does the hardware supports quantization?

I would really appreciate any help in this direction.

Thanks!

mtsanic commented 2 years ago

I don't think NVDLA supports low-precision quantized network. Even the 8-bit (normal quantized) networks are compiled with its own compiler. Maybe, you can achieve the pseudo low-precision, i.e. 4-bit written on 8-bit data, by providing calibration table. However, I didn't try anything like that. This idea might face issues with working model, as some models won't be implemented by NVDLA.