NVDLA INT8 Scaling - Githubissues

Hi,

The NVDLA documentation doesn't clearly describe how the scaling converters need to be programmed for INT8 quantized DNN inference. My question/confusion specifically is: How are scales (i.e., calibration table) computed for passing to the NVDLA compiler? The documentation recommends using TensorRT but doesn't mention exactly what the scale means. This is my understanding. Consider:


quantizedLayerInput = S1 * Input
quantizedWeights = S2 * W
resultTensor = S1 * S2 * R    
INT8ResultTensor = R * S3 / (S1 * S2) // S3 computed from layer output distribution

Each scale is computed as the following:

S_dist = 256 / (dist_max - dist_min)

If this understanding is correct, the scale passed to the NVDLA compiler should be:

S3 / (S1 * S2)

Guidance is very much appreciated.

Thanks, Hashim

nvdla / hw

NVDLA INT8 Scaling #331