About TensorRT speed test

ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

25.85k stars 5.15k forks source link

About TensorRT speed test #13299

Open BarryGUN opened 1 month ago

BarryGUN commented 1 month ago

Search before asking

[X] I have searched the YOLOv8 issues and discussions and found no similar questions.

Question

I have some question about your inference latency with TensorRT accelerate. Did you export your model to engine and test on c++ code? The latency including nms or not ?

Additional

No response

glenn-jocher commented 1 month ago

@BarryGUN hello!

Thank you for reaching out with your questions about TensorRT acceleration with YOLOv8.

Yes, we export the model to the TensorRT .engine format for testing. These tests are typically conducted using Python, but the results should be comparable when using C++ as both utilize the TensorRT runtime.

Regarding latency, the reported inference times include all processing steps, including Non-Maximum Suppression (NMS), unless explicitly stated otherwise. This ensures that the latency figures we provide reflect the total time required to process input and produce final detections.

If you have any more specific scenarios or configurations in mind, feel free to share, and I'll be happy to provide more detailed insights!

BarryGUN commented 1 month ago

Thank you for answer my question, i got it.

glenn-jocher commented 1 month ago

@BarryGUN You're welcome! If you have any more questions or need further assistance in the future, feel free to reach out. Happy coding! 😊

BarryGUN commented 1 month ago

I also want to know that in speed test, whether batch-size=32 depends on the image count of the coco2017 test datasets because 32 is divisible by 20288.

glenn-jocher commented 1 month ago

@BarryGUN Great question!

The batch size of 32 used in our speed tests does not directly depend on the total number of images in the COCO2017 dataset. Instead, it's chosen based on what we've found to provide a good balance between memory usage and computational efficiency on the GPU. This batch size is generally a common choice for performance testing as it can fully utilize the GPU capabilities without exceeding memory limits for most modern GPUs.

If you have specific hardware or constraints, you might consider adjusting the batch size to better fit your setup. Let me know if you need tips on how to choose the optimal batch size for your tests! 😊

BarryGUN commented 1 month ago

I got it ，thank you！

glenn-jocher commented 1 month ago

@BarryGUN You're welcome! If you have any more questions in the future, don't hesitate to ask. Happy coding! 😊

github-actions[bot] commented 4 days ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐