nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.67k stars 561 forks source link

NVDLA INT8 Scaling #331

Open hashimSharif opened 4 years ago

hashimSharif commented 4 years ago

Hi,

The NVDLA documentation doesn't clearly describe how the scaling converters need to be programmed for INT8 quantized DNN inference. My question/confusion specifically is: How are scales (i.e., calibration table) computed for passing to the NVDLA compiler? The documentation recommends using TensorRT but doesn't mention exactly what the scale means. This is my understanding. Consider:


quantizedLayerInput = S1 * Input
quantizedWeights = S2 * W
resultTensor = S1 * S2 * R    
INT8ResultTensor = R * S3 / (S1 * S2) // S3 computed from layer output distribution

Each scale is computed as the following:

S_dist = 256 / (dist_max - dist_min)

If this understanding is correct, the scale passed to the NVDLA compiler should be:

S3 / (S1 * S2)

Guidance is very much appreciated.

Thanks, Hashim