[Question] Running ./main is so long

isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

Other

276 stars 63 forks source link

Hi, running main will ask TensorRT to run a series of benchmarks on your GPU. They contain different implementations for layers used in the network and it will chose the fastest for your scenario. It is perfectly fine that this will take a few minutes. This is depending on your GPU/hardware. CPU is not the "bottleneck" here. There are flags for TensorRT to accelerate the process, basically telling it to run fewer test samples, ignore certain implementations, keep settings for layers of the same type (but they might have different kernel size and so on), etc. Deepstream for example is faster with optimizing this network. I don't believe that making this process faster is a huge benefit as it has to be done only once, but feel free to dive into the TensorRT docu (and also feel free to PR your contribs back if you want).

isarsoft / yolov4-triton-tensorrt

[Question] Running ./main is so long #23