jkjung-avt / tensorrt_demos

TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
https://jkjung-avt.github.io/
MIT License
1.75k stars 547 forks source link

Bounding box coordinates #296

Closed kubs0ne closed 3 years ago

kubs0ne commented 3 years ago

Hello, I have a question, is it possible to modify the trt_yolo.py file so it could show the coordinates of bounding boxes? Thank you for any information.

jkjung-avt commented 3 years ago

Check out this simplified example:

import cv2
import pycuda.autoinit

from utils.yolo_with_plugins import TrtYOLO

img = cv2.imread('dog.jpg')
trt_yolo= TrtYOLO('yolov4-416', (416, 416), 80)
print(trt_yolo.detect(img))
kubs0ne commented 3 years ago

Thank you for your response. The script returns something like this:

(array([[579, 317, 649, 480], [315, 194, 468, 478], [311, 12, 490, 410]]), array([0.9885378 , 0.83986324, 0.9923714 ], dtype=float32), array([16., 1., 0.], dtype=float32))

I don't know how to interpret this output,can you help me? Also I have a question if it's possible to do it in live video detection, with a simple script like this.

jkjung-avt commented 3 years ago

array([[579, 317, 649, 480], [315, 194, 468, 478], [311, 12, 490, 410]])

These are bounding box coordinates (x1, y1, x2, y2) of the 3 objects detected.

array([0.9885378 , 0.83986324, 0.9923714 ], dtype=float32)

These are confidence scores (between 0 and 1) of the detections.

array([16., 1., 0.], dtype=float32)

These are class id's of the objects. Referring to COCO_CLASS_LIST, class 16 is dog, class 1 is bicycle, and class 0 is person.

if it's possible to do it in live video detection, with a simple script like this.

import cv2
import pycuda.autoinit

from utils.yolo_with_plugins import TrtYOLO

trt_yolo= TrtYOLO('yolov4-416', (416, 416), 80)
cap = cv2.VideoCapture(0)  # USB WebCam 0
while True:
    _, img = cap.read()
    if img is None:  break
    print(trt_yolo.detect(img))
kubs0ne commented 3 years ago

Thank you very much for your help! It was really simple :)