laugh12321 / TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下,享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.
https://github.com/laugh12321/TensorRT-YOLO
GNU General Public License v3.0
540 stars 67 forks source link

[Question]: How to Perform Inference Using WongKinYiu/yolov9 Exported Model #15

Closed UcanYusuf closed 5 months ago

UcanYusuf commented 5 months ago

Hello, I trained a custom model. Afterthat I converted my weights to .onnx and .engine formats by using export.py which in https://github.com/WongKinYiu/yolov9. Now I am trying to use .engine file with python3 detect.py -e best.engine -o output -i /path, but I got error like

num_detections = int(outputs['num_detections'][idx]) KeyError: 'num_detections'

laugh12321 commented 5 months ago

The names of the output heads with NMS in the ONNX export from WongKinYiu/yolov9 are inconsistent with those in the project. There are two solutions: either re-export the ONNX using the python/export/yolov9/export.pyscript from that project and then proceed with inference, or modify the output head names in https://github.com/laugh12321/TensorRT-YOLO/blob/main/python/infer/yolo.py#L129-L153

UcanYusuf commented 5 months ago

The names of the output heads with NMS in the ONNX export from WongKinYiu/yolov9 are inconsistent with those in the project. There are two solutions: either re-export the ONNX using the python/export/yolov9/export.pyscript from that project and then proceed with inference, or modify the output head names in https://github.com/laugh12321/TensorRT-YOLO/blob/main/python/infer/yolo.py#L129-L153

trtexec --onnx=best.onnx --saveEngine=best.engine --minShapes=images:1x3x640x640 --optShapes=images:4x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --memPoolSize=workspace:1000 [04/16/2024-15:31:32] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars [04/16/2024-15:31:33] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [04/16/2024-15:39:10] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy. [04/16/2024-15:39:10] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights. [04/16/2024-15:39:10] [W] [TRT] Check verbose logs for the list of affected weights. [04/16/2024-15:39:10] [W] [TRT] - 186 weights are affected by this issue: Detected subnormal FP16 values. [04/16/2024-15:39:10] [W] [TRT] - 28 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value. [04/16/2024-15:39:11] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars [04/16/2024-15:39:14] [W] * GPU compute time is unstable, with coefficient of variance = 1.05411%. [04/16/2024-15:39:14] [W] If not already in use, locking GPU clock frequency or adding --useSpinWait may improve the stability.

I exported .onnx by using export.py which in this repo. But I got error while converting .engine format with tensorrt 8.5.3.1

Edit: I guess these are just warning :D It's working thanks.

laugh12321 commented 5 months ago

No worries, these logs are just warnings, they don't have any impact.

UcanYusuf commented 5 months ago

Hello, how can I perform inference on video?

laugh12321 commented 5 months ago

Performing inference on a video involves using tools like OpenCV to read the video stream frame by frame and then executing inference on each frame. Although specific examples of performing inference on videos might not have been provided yet, the process is similar to performing inference on images. Here's a basic example using OpenCV to read a video and perform inference on each frame:

import cv2

# Initialize OpenCV video capture object
cap = cv2.VideoCapture('your_video.mp4')

# Check if the video opened successfully
if not cap.isOpened():
    print("Error: Unable to open video file.")

# Loop through each frame of the video
while cap.isOpened():
    # Read a frame
    ret, frame = cap.read()

    # Check if frame was read successfully
    if not ret:
        break

    # Insert inference code here to perform inference on the current frame

    # Display the current frame
    cv2.imshow('Frame', frame)

    # Exit loop if 'q' key is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
cv2.destroyAllWindows()

Replace 'your_video.mp4' with the path to your video file. Within the loop, each frame is read from the video, and inference can be performed on each frame. Insert your inference code at the appropriate location to execute inference on each frame.

UcanYusuf commented 5 months ago

Performing inference on a video involves using tools like OpenCV to read the video stream frame by frame and then executing inference on each frame. Although specific examples of performing inference on videos might not have been provided yet, the process is similar to performing inference on images. Here's a basic example using OpenCV to read a video and perform inference on each frame:

import cv2

# Initialize OpenCV video capture object
cap = cv2.VideoCapture('your_video.mp4')

# Check if the video opened successfully
if not cap.isOpened():
    print("Error: Unable to open video file.")

# Loop through each frame of the video
while cap.isOpened():
    # Read a frame
    ret, frame = cap.read()

    # Check if frame was read successfully
    if not ret:
        break

    # Insert inference code here to perform inference on the current frame

    # Display the current frame
    cv2.imshow('Frame', frame)

    # Exit loop if 'q' key is pressed
    if cv2.waitKey(25) & 0xFF == ord('q'):
        break

# Release resources
cap.release()
cv2.destroyAllWindows()

Replace 'your_video.mp4' with the path to your video file. Within the loop, each frame is read from the video, and inference can be performed on each frame. Insert your inference code at the appropriate location to execute inference on each frame.

I have previously converted my model to the engine format with TensorRT and I'm trying to use it. I can use it on images within a folder in this format, but when the batch_size is 1, the fps value is the same as when I use the .pt format. Therefore, I set the batch_size to 8. How can I adjust this batch_size for real-time or videos? I mean, I didn't quite understand the logic behind the batch_size.

laugh12321 commented 5 months ago

If you want to improve FPS using C++ code, there is still significant optimization potential in Python code.

Setting the batch size is to meet the requirement of processing multiple inputs simultaneously. If you only have one video, setting the batch size to 1 is sufficient. If you want to process multiple videos in parallel, one approach is to set the batch size to the number of videos, and then merge each video frame into a single input for the model. Another approach is to launch multiple separate inference processes, with each process handling one video. This allows for better utilization of system resources and faster inference speed.