INT8 quantization? - Githubissues

imchwan commented 3 years ago

Have you tried trt w/ INT8 quantization? Does your algorithm supports generating calibration table?

If you can share your experience or guidance to some examples, that will be appreciated.

talebolano commented 3 years ago

@imchwan I haven't tried int8, but you can implement int8 through the following steps:

Write the calibration picture address into calibration.txt
./makeCudaEngine -i ../../ScaledYOLOv4/yolov4-csp.onnx -o yolov4-csp.trt -c calibration.txt -m 2

it should work

imchwan commented 3 years ago

Thank you for the quick coaching.

I've tried your suggested steps, however, below error was occured.

ERRORFAILED_EXECUTION: std::exception
load batch 8 to 8                                                 
WARNING Explicit batch network detected and batch size specified, use execute without batch size instead.
INTERNAL_ERROR Assertion failed: d.nbDims >= 1                                                                                                                      
../rtSafe/safeHelpers.cpp:419                                                           
Aborting...

I've googled the above errors, and the solution was to use tensorrt 7.2. But I'm already using 7.2.1-1+cuda10.2.

So, I will let you know if I succeed with implementing int8.

Thank you!

talebolano commented 3 years ago

@imchwan I have solved this problem. Change line 250 of models.py in scaledyolov4 csp branch(large branch similar):

        elif ONNX_EXPORT:
            # Avoid broadcasting for ANE operations
            m = self.na * self.nx * self.ny
            ng = 1. / self.ng.repeat(m, 1)
            grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
            anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2)

            p = p.view(m, self.no)
            io = p.sigmoid()
            xy = (io[..., :2] * 2. - 0.5 + grid)
            wh = (io[..., 2:4] * 2) ** 2 * anchor_wh
            xy *= self.stride
            wh *= self.stride          

            return io[...,4:5], io[...,5:],xy , wh # conf cls xy wh

as

        elif ONNX_EXPORT:
            # Avoid broadcasting for ANE operations
            m = self.na * self.nx * self.ny
            ng = 1. / self.ng.repeat(m, 1)
            grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
            anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2)

            p = p.view(m, self.no)
            io = p.sigmoid()
            xy = (io[..., :2] * torch.tensor([2.]) - torch.tensor([.5]) + grid)
            wh = (io[..., 2:4] * torch.tensor([2.])) ** torch.tensor([2.]) * anchor_wh
            xy *= torch.tensor([self.stride]).float()
            wh *= torch.tensor([self.stride]).float()          

            return io[...,4:5], io[...,5:],xy , wh # conf cls xy wh

And regenerate the onnx model. then the progarm only pop up warning but no error

talebolano / TensorRT-Scaled-YOLOv4

INT8 quantization? #5