Dynamic batcing inference time

isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

Other

276 stars 63 forks source link

Thank you for your effort building this repo. I am facing the issue related to inference time when I run the model with batch size larger than 1. When I set batch size to 4 and pass 4 images to the model, it takes about 200 ms. However when I set the batch size to 4 and pass only 1 image to the model, it takes about 195 ms.

I do care about the inference time and I want to dynamically use the batching in run time by passing different batch sizes while keeping the inference time to the minimun.

Is it possible?

isarsoft / yolov4-triton-tensorrt

Dynamic batcing inference time #58