Open darouwan opened 1 week ago
π Hello @darouwan, thank you for bringing this to our attention π! We appreciate your interest in utilizing TensorRT to optimize inference speeds.
To better investigate this issue, we recommend providing additional details or testing with a minimum reproducible example (MRE). This helps us reproduce and address the behavior youβre noticing more effectively.
Here are a few suggestions to verify and troubleshoot your setup:
Ensure you've upgraded to the latest version of the ultralytics
package, as well as its requirements, in a Python>=3.8 environment:
pip install -U ultralytics
Make sure your inference environment is correctly set up with all dependencies, such as CUDA, cuDNN, and PyTorch, optimized for your GPU.
Double-check that youβre using the model in TensorRT format correctly and that the TensorRT model was successfully exported. For assistance with exporting models, refer to the Docs on Exporting Models.
Also, consider testing with a pre-built environment to ensure everything is configured correctly. YOLO may be run in any of these verified environments:
For community support or additional resources, consider joining the discussions:
Our automated continuous integration tests, verified across platforms like macOS, Windows, and Ubuntu, validate correctness of all YOLO modes and tasks. For current build status, see the badge below:
We hope this helps π. An Ultralytics engineer will review and assist further soon. Thank you again for using Ultralytics! π
You should be using perf_counter to measure time
@Y-T-G My current code is:
from ultralytics import YOLO
from loguru import logger
import datetime
import cv2
import time
model = YOLO("yolo11m.pt")
img = cv2.imread('bus.jpg')
start_time = time.perf_counter()
for _ in range(2000):
model.predict(img)
running_time1 = time.perf_counter() - start_time
model = YOLO("yolo11m.engine", task='detect')
start_time = time.perf_counter()
for _ in range(2000):
model.predict(img)
running_time2 = time.perf_counter() - start_time
logger.info(f"running time: {running_time1} s")
logger.info(f"running time: {running_time2} s")
The result is
2025-02-04 09:36:49.329 | INFO | __main__:<module>:22 - running time: 14.48436415195465 s
2025-02-04 09:36:49.329 | INFO | __main__:<module>:23 - running time: 13.611100463196635 s
Still very minor improvement
First prediction is slower, so you need to exclude that
and use verbose=False
And export with half=True and nms=True
@Y-T-G half=True works, but it will decrease the accuracy, isn't it?
And the nms show argument 'nms' is not supported for format='engine'
You need to update to latest version
half=True works, but it will decrease the accuracy, isn't it?
Not noticeably
Search before asking
Question
I have tried to use tensorrt to speed up the inference speed, but it has no impact. Is it normal?
The way I export tensorrt:
The way I test speed
The result are both 7 second.
Environment: Ultralytics 8.3.18 π Python-3.10.12 torch-2.4.1+cu124 CUDA:0 (NVIDIA RTX 4000 Ada Generation, 20040MiB) Setup complete β (24 CPUs, 62.4 GB RAM, 143.4/913.8 GB disk)
OS Linux-6.8.0-47-generic-x86_64-with-glibc2.35 Environment Linux Python 3.10.12 Install git RAM 62.44 GB Disk 143.4/913.8 GB CPU Intel Xeon w5-3425 CPU count 24 GPU NVIDIA RTX 4000 Ada Generation, 20040MiB GPU count 2 CUDA 12.4
numpy β 1.26.3>=1.23.0 matplotlib β 3.9.2>=3.3.0 opencv-python β 4.10.0.84>=4.6.0 pillow β 10.2.0>=7.1.2 pyyaml β 6.0.2>=5.3.1 requests β 2.32.3>=2.23.0 scipy β 1.14.1>=1.4.1 torch β 2.4.1+cu124>=1.8.0 torchvision β 0.19.1+cu124>=0.9.0 tqdm β 4.66.5>=4.64.0 psutil β 6.0.0 py-cpuinfo β 9.0.0 pandas β 2.2.3>=1.1.4 seaborn β 0.13.2>=0.11.0 ultralytics-thop β 2.0.9>=2.0.0 torch β 2.4.1+cu124!=2.4.0,>=1.8.0; sys_platform == "win32"
Additional
No response