ultralytics / ultralytics

Ultralytics YOLO11 πŸš€
https://docs.ultralytics.com
GNU Affero General Public License v3.0
36.33k stars 7k forks source link

Tensorrt cannot speed up inference time well #18973

Open darouwan opened 1 week ago

darouwan commented 1 week ago

Search before asking

Question

I have tried to use tensorrt to speed up the inference speed, but it has no impact. Is it normal?

The way I export tensorrt:

from ultralytics import YOLO

model = YOLO("yolo11m.pt")

model.export(format="engine")

The way I test speed

from ultralytics import YOLO
from loguru import logger
import datetime
import cv2

model = YOLO("yolo11m.pt")
start_time = datetime.datetime.today()
img = cv2.imread('bus.jpg')
for _ in range(1000):
    model.predict(img)
running_time1 = (datetime.datetime.today() - start_time).seconds

model = YOLO("yolo11m.engine", task='detect')
start_time = datetime.datetime.today()
for _ in range(1000):
    model.predict(img)
running_time2 = (datetime.datetime.today() - start_time).seconds

logger.info(f"running time: {running_time1}")
logger.info(f"running time: {running_time2}")

The result are both 7 second.

Environment: Ultralytics 8.3.18 πŸš€ Python-3.10.12 torch-2.4.1+cu124 CUDA:0 (NVIDIA RTX 4000 Ada Generation, 20040MiB) Setup complete βœ… (24 CPUs, 62.4 GB RAM, 143.4/913.8 GB disk)

OS Linux-6.8.0-47-generic-x86_64-with-glibc2.35 Environment Linux Python 3.10.12 Install git RAM 62.44 GB Disk 143.4/913.8 GB CPU Intel Xeon w5-3425 CPU count 24 GPU NVIDIA RTX 4000 Ada Generation, 20040MiB GPU count 2 CUDA 12.4

numpy βœ… 1.26.3>=1.23.0 matplotlib βœ… 3.9.2>=3.3.0 opencv-python βœ… 4.10.0.84>=4.6.0 pillow βœ… 10.2.0>=7.1.2 pyyaml βœ… 6.0.2>=5.3.1 requests βœ… 2.32.3>=2.23.0 scipy βœ… 1.14.1>=1.4.1 torch βœ… 2.4.1+cu124>=1.8.0 torchvision βœ… 0.19.1+cu124>=0.9.0 tqdm βœ… 4.66.5>=4.64.0 psutil βœ… 6.0.0 py-cpuinfo βœ… 9.0.0 pandas βœ… 2.2.3>=1.1.4 seaborn βœ… 0.13.2>=0.11.0 ultralytics-thop βœ… 2.0.9>=2.0.0 torch βœ… 2.4.1+cu124!=2.4.0,>=1.8.0; sys_platform == "win32"

Additional

No response

UltralyticsAssistant commented 1 week ago

πŸ‘‹ Hello @darouwan, thank you for bringing this to our attention πŸš€! We appreciate your interest in utilizing TensorRT to optimize inference speeds.

To better investigate this issue, we recommend providing additional details or testing with a minimum reproducible example (MRE). This helps us reproduce and address the behavior you’re noticing more effectively.

Here are a few suggestions to verify and troubleshoot your setup:

  1. Ensure you've upgraded to the latest version of the ultralytics package, as well as its requirements, in a Python>=3.8 environment:

    pip install -U ultralytics
  2. Make sure your inference environment is correctly set up with all dependencies, such as CUDA, cuDNN, and PyTorch, optimized for your GPU.

  3. Double-check that you’re using the model in TensorRT format correctly and that the TensorRT model was successfully exported. For assistance with exporting models, refer to the Docs on Exporting Models.

Also, consider testing with a pre-built environment to ensure everything is configured correctly. YOLO may be run in any of these verified environments:

For community support or additional resources, consider joining the discussions:

Our automated continuous integration tests, verified across platforms like macOS, Windows, and Ubuntu, validate correctness of all YOLO modes and tasks. For current build status, see the badge below: Ultralytics CI

We hope this helps πŸ”. An Ultralytics engineer will review and assist further soon. Thank you again for using Ultralytics! πŸš€

Y-T-G commented 1 week ago

You should be using perf_counter to measure time

darouwan commented 6 days ago

@Y-T-G My current code is:

from ultralytics import YOLO
from loguru import logger
import datetime
import cv2
import time

model = YOLO("yolo11m.pt")
img = cv2.imread('bus.jpg')
start_time = time.perf_counter()
for _ in range(2000):
    model.predict(img)
running_time1 = time.perf_counter() - start_time

model = YOLO("yolo11m.engine", task='detect')
start_time = time.perf_counter()
for _ in range(2000):
    model.predict(img)
running_time2 = time.perf_counter() - start_time

logger.info(f"running time: {running_time1} s")
logger.info(f"running time: {running_time2} s")

The result is

2025-02-04 09:36:49.329 | INFO     | __main__:<module>:22 - running time: 14.48436415195465 s
2025-02-04 09:36:49.329 | INFO     | __main__:<module>:23 - running time: 13.611100463196635 s

Still very minor improvement

Y-T-G commented 6 days ago

First prediction is slower, so you need to exclude that

Y-T-G commented 6 days ago

and use verbose=False

Y-T-G commented 6 days ago

And export with half=True and nms=True

darouwan commented 3 hours ago

@Y-T-G half=True works, but it will decrease the accuracy, isn't it? And the nms show argument 'nms' is not supported for format='engine'

Y-T-G commented 3 hours ago

You need to update to latest version

Y-T-G commented 3 hours ago

half=True works, but it will decrease the accuracy, isn't it?

Not noticeably