isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
276 stars 63 forks source link

[Question] Running ./main is so long #23

Closed leviethung2103 closed 3 years ago

leviethung2103 commented 3 years ago

Hi author,

Thank you so much for creating this repository. I had a problem is that when I run the ./main command, it takes a lot of time to create the engine model.

The process will be

./main
Creating builder
Creating model

It takes several minutes to complete the process. I would like to ask you that can we accelerate the processing time at this step by using all the processes of CPU? Thank you in advance.

philipp-schmidt commented 3 years ago

Hi, running main will ask TensorRT to run a series of benchmarks on your GPU. They contain different implementations for layers used in the network and it will chose the fastest for your scenario. It is perfectly fine that this will take a few minutes. This is depending on your GPU/hardware. CPU is not the "bottleneck" here. There are flags for TensorRT to accelerate the process, basically telling it to run fewer test samples, ignore certain implementations, keep settings for layers of the same type (but they might have different kernel size and so on), etc. Deepstream for example is faster with optimizing this network. I don't believe that making this process faster is a huge benefit as it has to be done only once, but feel free to dive into the TensorRT docu (and also feel free to PR your contribs back if you want).