roboflow / supervision

We write your reusable computer vision tools. 💜
https://supervision.roboflow.com
MIT License
24.38k stars 1.81k forks source link

sv changes bbox shape for object detection with YOLOv8?? #1362

Open abelBEDOYA opened 4 months ago

abelBEDOYA commented 4 months ago

Search before asking

Question

I've been using supervision, its tracker, annotators, ... Nice work!! However I've noticed that, doing object detection with yolov8, bboxe shape from ultralytics are changed by supervision even though it refers to the same detection. The following screenshot shows a detected object provided by YOLO, ultralytics.Result (before doing supervision_tracker.update(results[0]) and after parsing it to supervision_tracker.

Screenshot from 2024-07-15 12-16-53 The bboxes are diferent. I expect they shouldn't...

Can this bbox shape change be removed? I would like to keep original bbox shape.

Thanks!!

Additional

No response

LinasKo commented 4 months ago

Hi @abelBEDOYA 👋

Could you share a short snippet of the code, with the print statements?

Also, to clarify, which of these are you measuring the difference between?

  1. ultralytics result (result.boxes.xyxy)
  2. Detections, as created by from_ultralytics
  3. Detections, as updated by tracker.update_with_detections
abelBEDOYA commented 4 months ago

Here is the code. It just open webcam with cv2 and runs callback() parsing last frame, which infers and tracks:

import numpy as np
import supervision as sv
from ultralytics import YOLO
import torch

model = YOLO("yolov8n.pt")
tracker = sv.ByteTrack()
box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

def callback(frame: np.ndarray, _: int) -> np.ndarray:
    results = model(frame)[0]
    print('YOLO bbox: ', results.boxes.cpu().xyxy[0] if len(results.boxes.cpu().xyxy)>0 else [])
    detections = sv.Detections.from_ultralytics(results)
    detections = tracker.update_with_detections(detections)
    print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())
    print('\n \n ')
    labels = [
        f"#{tracker_id} {results.names[class_id]}"
        for class_id, tracker_id
        in zip(detections.class_id, detections.tracker_id)
    ]

    annotated_frame = box_annotator.annotate(
        frame.copy(), detections=detections)
    return label_annotator.annotate(
        annotated_frame, detections=detections, labels=labels)

import cv2

# Abre la webcam (0 es el índice por defecto de la cámara)
cap = cv2.VideoCapture(0)

# Verifica si la cámara se abrió correctamente
if not cap.isOpened():
    print("Error: No se puede abrir la cámara")
    exit()

while True:
    # Captura frame por frame
    ret, frame = cap.read()

    # Si no se recibió el frame correctamente, sal del loop
    if not ret:
        print("Error: No se puede recibir frame (stream end?). Saliendo ...")
        break

    img = callback(frame, 0)
    # # Muestra el frame resultante
    cv2.imshow('Webcam', img)

    # Presiona 'q' para salir del loop
    if cv2.waitKey(1) == ord('q'):
        break

# Cuando todo esté listo, libera el capture
cap.release()
cv2.destroyAllWindows()

These are the "key" lines: Screenshot from 2024-07-15 13-12-25

The output bbox have change (YOLO vs SV): Screenshot from 2024-07-15 13-11-54

LinasKo commented 4 months ago

Curious. Thanks for letting us know - we'll test it.

rolson24 commented 4 months ago

@abelBEDOYA, This is interesting, what version of supervision are you using? I seem to remember this was an issue we fixed a few months ago, but it may not be working correctly.

abelBEDOYA commented 4 months ago
$ pip show supervision
Name: supervision
Version: 0.21.0
Summary: A set of easy-to-use utils that will come in handy in any Computer Vision project
Home-page: https://github.com/roboflow/supervision
Author: Piotr Skalski
Author-email: piotr.skalski92@gmail.com
License: MIT
Location: /home/faraujo/anaconda3/lib/python3.9/site-packages
Requires: defusedxml, matplotlib, numpy, opencv-python-headless, pillow, pyyaml, scipy
Required-by: 
rolson24 commented 4 months ago

Hmm, the latest release is 0.22.0, please try the latest one and see if it helps. In the meantime I will test your code.

rolson24 commented 4 months ago

Hi @abelBEDOYA,

I think I know what your problem is. It looks like you are printing the bounding box stored in the tracked object in this line

print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())

This prints the internal bounding box that the tracker is using and which is associated with location and size velocities within the tracker and may be different than the actual bounding box from the most recent frame. If you want the precise bounding box from detector that is associated with that track, you will want to get the bounding box from the Detections object returned by tracker.update_with_detections(). This object contains the original bounding boxes from the detector associated with a tracker id. So if you wanted to print those bounding boxes, you would change the line to be

print('bbox from tracker sv: ', detections.xyxy[0])
LinasKo commented 4 months ago

I just wanted to take some time to say thanks, @rolson24. The tracker issues have been plaguing us for a while, and we've not had much time to look at it. We really appreciate you helping out!

abelBEDOYA commented 4 months ago

Okey! Thanks @rolson24! I also take this opportunity to ask you about the detection and track association.

My point is, I start with ultralytics Result object which contains detections. I parse them to detections = sv.Detections.from_ultralytics(results) and then detections = tracker.update_with_detections(detections). There are some atributes that ultralytics Results can have like keypoints and segmentation. I would like to associate those yolo detections with the sv tracks in order to give them an id_tracking. That the reason I was comparing bboxes between yolo detections and supervision detections. The association is not a 1to1 because, for example, not always the number of yolo detections is the same of sv ones.

How can this association be done?

Thanks again!

rolson24 commented 4 months ago

If you use the detections returned from tracker.update_with_detections(detections) and the Detections object has segmentation masks, then the segmentation masks from the model will be retained and have a tracker_id assigned to them.

Unfortunately, the tracker does not support Keypoints right now. From what you are describing, it sounds like you would want to use a yolo-pose model which returns bboxes and keypoints, and you would want to track the objects. This may be something we add, but for now I have a somewhat hacky idea of how you may be able to do this:

results = model(frame, imgsz = 1280,verbose=False)[0]
pre_track_detections = sv.Detections.from_ultralytics(results)
keypoints = sv.KeyPoints.from_ultralytics(results)
post_track_detections = byte_tracker.update_with_detections(pre_track_detections)

pre_track_bounding_boxes = pre_track_detections.xyxy
post_track_bounding_boxes = post_track_detections.xyxy

ious = sv.tracker.byte_tracker.matching.box_iou_batch(pre_track_bounding_boxes, post_track_bounding_boxes)
iou_costs = 1 - ious
matches, _, _ = sv.tracker.byte_tracker.matching.linear_assignment(iou_costs, 0.5)

post_track_keypoints = sv.KeyPoints.empty()

post_track_keypoints.xy = np.empty((len(post_track_detections), keypoints.xy.shape[1], 2), dtype=np.float32)
post_track_keypoints.class_id = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.confidence = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.data = keypoints.data

for i_detection, i_track in matches:
    post_track_keypoints.xy[i_track] = keypoints.xy[i_detection]
    post_track_keypoints.class_id[i_track] = keypoints.class_id[i_detection]
    post_track_keypoints.confidence[i_track] = keypoints.confidence[i_detection]

This will make it so that the keypoints in post_track_keypoints have the same index as their corresponding bounding box in post_track_detections. Its kinda hacky, but it should work. I also have a colab notebook that demonstrates it here