levipereira / yolov9-qat

Implementation of YOLOv9 QAT optimized for deployment on TensorRT platforms.
Apache License 2.0
71 stars 11 forks source link

Assertion failed: scaleAllPositive && "Scale coefficients must all be positive" #13

Open Wooho-Moon opened 3 months ago

Wooho-Moon commented 3 months ago

Thanks for awesomes works! I recently try to fine-tune my own model based yolov9 and can be obtain pt file sucessfully. After converting pt into onnx, I try to deploy onnx into tensorrt. the probelm occurs in this step.

image

this picture show that there is the scale factor whoese value is 0. I don't know exactly what I should do more. Could you give me an advise??

levipereira commented 3 months ago

It looks like there might be a bug in our PyTorch-Quantization tool, specifically related to the generation of zero or negative scales, which should never occur.

Did you follow the installation process described at installation, or did you set up the environment yourself?

Wooho-Moon commented 3 months ago

No, actually I cannot install tensorrt version 10, since I have to deploy trt model on jetson orin. So, I have to set my own enviroment Is it related to tensorrt version..? I will try to all dependencies except tensorrt in your script, then I let you know the result :)

levipereira commented 3 months ago

Ok. I have Jetson Orin Nano here.. I'll find some free time and test it

Wooho-Moon commented 3 months ago

Why I use lower version of tenssorrt is that if I convert onnx into tensonrrt in 4090 or 3090, they cannot deploy on orin nx.. It might be because the discrepencies between orin and rtx make the deployed trt file on server not use on jetson... anyway, I try to do more!

Wooho-Moon commented 3 months ago

I slove the problem. As you mentioned before, That error may occur due to pytorch quantization. First of all, I installed pytorch quantization by using pip commandline. the comman line is here: pip install --no-cache-dir --index-url https://pypi.nvidia.com/ --index-url https://pypi.org/simple pytorch-quantization==2.1.3

that library looked like having a bug.

So, I installed that package with source

the steps are here:

git clone https://github.com/NVIDIA/TensorRT.git cd tools/pytorch-quantization pip install -r requirements.txt python setup.py install

After installing pytorch quantization with the source, I then coverted pt into onnx and could deploy orin nx. Thanks for impressive works, again.

levipereira commented 3 months ago

If you mind can you post only === Performance summary === by runing a test on ori https://github.com/levipereira/yolov9-qat?tab=readme-ov-file#benchmark if you test using batch size 1 , 4 , 8 is enough.

Wooho-Moon commented 3 months ago

If you mind can you post only === Performance summary === by runing a test on ori https://github.com/levipereira/yolov9-qat?tab=readme-ov-file#benchmark if you test using batch size 1 , 4 , 8 is enough.

Sure why not. But I don't have any resource including a orin nx for that report on this week. This model is not as same as original yolov9. So, I have a additional work for that report. (e.g. converting original model and testing that model ).May I report it the next week?

Wooho-Moon commented 3 months ago

Hello, do you want me to give a report about jetson orin nx's results. right?? I downloaded the https://github.com/WongKinYiu/yolov9/releases/download/v0.1/yolov9-c-converted.pt weight. I fine tune that qt using code of this qat-trining repo, then export onnx and convert onnx into trt. The report is following

--batch 1-- [06/03/2024-14:56:40] [I] === Performance summary === [06/03/2024-14:56:40] [I] Throughput: 115.278 qps [06/03/2024-14:56:40] [I] Latency: min = 8.76172 ms, max = 8.80664 ms, mean = 8.7761 ms, median = 8.77515 ms, percentile(90%) = 8.78369 ms, percentile(95%) = 8.78711 ms, percentile(99%) = 8.79541 ms [06/03/2024-14:56:40] [I] Enqueue Time: min = 0.0576172 ms, max = 0.105103 ms, mean = 0.0632496 ms, median = 0.0622559 ms, percentile(90%) = 0.0668945 ms, percentile(95%) = 0.0701904 ms, percentile(99%) = 0.0836182 ms [06/03/2024-14:56:40] [I] H2D Latency: min = 0.0683594 ms, max = 0.0820312 ms, mean = 0.0704411 ms, median = 0.0703125 ms, percentile(90%) = 0.0712891 ms, percentile(95%) = 0.0734863 ms, percentile(99%) = 0.0766602 ms [06/03/2024-14:56:40] [I] GPU Compute Time: min = 8.65234 ms, max = 8.69727 ms, mean = 8.66557 ms, median = 8.66455 ms, percentile(90%) = 8.67236 ms, percentile(95%) = 8.67676 ms, percentile(99%) = 8.68359 ms [06/03/2024-14:56:40] [I] D2H Latency: min = 0.0380859 ms, max = 0.0419922 ms, mean = 0.0400818 ms, median = 0.0400391 ms, percentile(90%) = 0.0410156 ms, percentile(95%) = 0.0410156 ms, percentile(99%) = 0.0415039 ms [06/03/2024-14:56:40] [I] Total Host Walltime: 10.028 s [06/03/2024-14:56:40] [I] Total GPU Compute Time: 10.0174 s

--batch 4-- [06/03/2024-14:42:23] [I] === Performance summary === [06/03/2024-14:42:23] [I] Throughput: 38.6818 qps [06/03/2024-14:42:23] [I] Latency: min = 25.7783 ms, max = 36.1599 ms, mean = 26.2126 ms, median = 25.8057 ms, percentile(90%) = 25.8514 ms, percentile(95%) = 29.2401 ms, percentile(99%) = 35.4739 ms [06/03/2024-14:42:23] [I] Enqueue Time: min = 0.0732422 ms, max = 0.290527 ms, mean = 0.111888 ms, median = 0.105286 ms, percentile(90%) = 0.144043 ms, percentile(95%) = 0.154053 ms, percentile(99%) = 0.19165 ms [06/03/2024-14:42:23] [I] H2D Latency: min = 0.247559 ms, max = 0.299316 ms, mean = 0.255223 ms, median = 0.25293 ms, percentile(90%) = 0.264404 ms, percentile(95%) = 0.267334 ms, percentile(99%) = 0.275391 ms [06/03/2024-14:42:23] [I] GPU Compute Time: min = 25.3561 ms, max = 35.7307 ms, mean = 25.7854 ms, median = 25.3779 ms, percentile(90%) = 25.4128 ms, percentile(95%) = 28.8115 ms, percentile(99%) = 35.0488 ms [06/03/2024-14:42:23] [I] D2H Latency: min = 0.140625 ms, max = 0.21582 ms, mean = 0.171945 ms, median = 0.170898 ms, percentile(90%) = 0.176758 ms, percentile(95%) = 0.179688 ms, percentile(99%) = 0.194092 ms [06/03/2024-14:42:23] [I] Total Host Walltime: 10.0823 s [06/03/2024-14:42:23] [I] Total GPU Compute Time: 10.0563 s

--batch 8-- [06/03/2024-14:51:40] [I] === Performance summary === [06/03/2024-14:51:40] [I] Throughput: 20.5566 qps [06/03/2024-14:51:40] [I] Latency: min = 48.0117 ms, max = 63.1135 ms, mean = 49.2359 ms, median = 48.1123 ms, percentile(90%) = 53.9458 ms, percentile(95%) = 57.438 ms, percentile(99%) = 61.1533 ms [06/03/2024-14:51:40] [I] Enqueue Time: min = 0.0930176 ms, max = 0.269531 ms, mean = 0.128413 ms, median = 0.127197 ms, percentile(90%) = 0.145752 ms, percentile(95%) = 0.153564 ms, percentile(99%) = 0.24292 ms [06/03/2024-14:51:40] [I] H2D Latency: min = 0.478516 ms, max = 0.558105 ms, mean = 0.489197 ms, median = 0.488281 ms, percentile(90%) = 0.494629 ms, percentile(95%) = 0.502686 ms, percentile(99%) = 0.539551 ms [06/03/2024-14:51:40] [I] GPU Compute Time: min = 47.2241 ms, max = 62.2919 ms, mean = 48.4187 ms, median = 47.2949 ms, percentile(90%) = 53.1313 ms, percentile(95%) = 56.6108 ms, percentile(99%) = 60.333 ms [06/03/2024-14:51:40] [I] D2H Latency: min = 0.273438 ms, max = 0.368652 ms, mean = 0.328068 ms, median = 0.326172 ms, percentile(90%) = 0.336914 ms, percentile(95%) = 0.339844 ms, percentile(99%) = 0.355957 ms [06/03/2024-14:51:40] [I] Total Host Walltime: 10.167 s [06/03/2024-14:51:40] [I] Total GPU Compute Time: 10.1195 s

Wooho-Moon commented 3 months ago

@levipereira Sir, Do you have any plan to develop yolov9-QAT for INT4 format?

realVegetable commented 3 months ago

I slove the problem. As you mentioned before, That error may occur due to pytorch quantization. First of all, I installed pytorch quantization by using pip commandline. the comman line is here: pip install --no-cache-dir --index-url https://pypi.nvidia.com/ --index-url https://pypi.org/simple pytorch-quantization==2.1.3

that library looked like having a bug.

So, I installed that package with source

the steps are here:

git clone https://github.com/NVIDIA/TensorRT.git cd tools/pytorch-quantization pip install -r requirements.txt python setup.py install

After installing pytorch quantization with the source, I then coverted pt into onnx and could deploy orin nx. Thanks for impressive works, again.

@Wooho-Moon @levipereira I encountered the same problem and then reinstalled pytorch this way, but I still had the same problem when I try to convert onnx model to tensorrt engine agian under the guidiance of @levipereira . Do you have a better solution

levipereira commented 2 months ago

@levipereira Sir, Do you have any plan to develop yolov9-QAT for INT4 format?

No.

DXL64 commented 1 month ago

You can rewrite compute_amax function like this to fix this issue: `

def compute_amax(model, **kwargs):

    for name, module in model.named_modules():

        if isinstance(module, quant_nn.TensorQuantizer):

            if module._calibrator is not None:

                if isinstance(module._calibrator, calib.MaxCalibrator):

                    module.load_calib_amax()

                else:

                    module.load_calib_amax(**kwargs)

                module._amax = module._amax.to(device)

                if torch.any(module._amax <= 1e-6):

                    module._amax = torch.clamp(module._amax, min=1e-6)

`