Open abelBEDOYA opened 4 months ago
Hi @abelBEDOYA 👋
Could you share a short snippet of the code, with the print statements?
Also, to clarify, which of these are you measuring the difference between?
result.boxes.xyxy
)from_ultralytics
tracker.update_with_detections
Here is the code. It just open webcam with cv2 and runs callback() parsing last frame, which infers and tracks:
import numpy as np
import supervision as sv
from ultralytics import YOLO
import torch
model = YOLO("yolov8n.pt")
tracker = sv.ByteTrack()
box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()
def callback(frame: np.ndarray, _: int) -> np.ndarray:
results = model(frame)[0]
print('YOLO bbox: ', results.boxes.cpu().xyxy[0] if len(results.boxes.cpu().xyxy)>0 else [])
detections = sv.Detections.from_ultralytics(results)
detections = tracker.update_with_detections(detections)
print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())
print('\n \n ')
labels = [
f"#{tracker_id} {results.names[class_id]}"
for class_id, tracker_id
in zip(detections.class_id, detections.tracker_id)
]
annotated_frame = box_annotator.annotate(
frame.copy(), detections=detections)
return label_annotator.annotate(
annotated_frame, detections=detections, labels=labels)
import cv2
# Abre la webcam (0 es el índice por defecto de la cámara)
cap = cv2.VideoCapture(0)
# Verifica si la cámara se abrió correctamente
if not cap.isOpened():
print("Error: No se puede abrir la cámara")
exit()
while True:
# Captura frame por frame
ret, frame = cap.read()
# Si no se recibió el frame correctamente, sal del loop
if not ret:
print("Error: No se puede recibir frame (stream end?). Saliendo ...")
break
img = callback(frame, 0)
# # Muestra el frame resultante
cv2.imshow('Webcam', img)
# Presiona 'q' para salir del loop
if cv2.waitKey(1) == ord('q'):
break
# Cuando todo esté listo, libera el capture
cap.release()
cv2.destroyAllWindows()
These are the "key" lines:
The output bbox have change (YOLO vs SV):
Curious. Thanks for letting us know - we'll test it.
@abelBEDOYA, This is interesting, what version of supervision are you using? I seem to remember this was an issue we fixed a few months ago, but it may not be working correctly.
$ pip show supervision
Name: supervision
Version: 0.21.0
Summary: A set of easy-to-use utils that will come in handy in any Computer Vision project
Home-page: https://github.com/roboflow/supervision
Author: Piotr Skalski
Author-email: piotr.skalski92@gmail.com
License: MIT
Location: /home/faraujo/anaconda3/lib/python3.9/site-packages
Requires: defusedxml, matplotlib, numpy, opencv-python-headless, pillow, pyyaml, scipy
Required-by:
Hmm, the latest release is 0.22.0, please try the latest one and see if it helps. In the meantime I will test your code.
Hi @abelBEDOYA,
I think I know what your problem is. It looks like you are printing the bounding box stored in the tracked object in this line
print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())
This prints the internal bounding box that the tracker is using and which is associated with location and size velocities within the tracker and may be different than the actual bounding box from the most recent frame. If you want the precise bounding box from detector that is associated with that track, you will want to get the bounding box from the Detections
object returned by tracker.update_with_detections()
. This object contains the original bounding boxes from the detector associated with a tracker id.
So if you wanted to print those bounding boxes, you would change the line to be
print('bbox from tracker sv: ', detections.xyxy[0])
I just wanted to take some time to say thanks, @rolson24. The tracker issues have been plaguing us for a while, and we've not had much time to look at it. We really appreciate you helping out!
Okey! Thanks @rolson24! I also take this opportunity to ask you about the detection and track association.
My point is, I start with ultralytics Result object which contains detections. I parse them to detections = sv.Detections.from_ultralytics(results)
and then detections = tracker.update_with_detections(detections)
. There are some atributes that ultralytics Results can have like keypoints and segmentation. I would like to associate those yolo detections with the sv tracks in order to give them an id_tracking. That the reason I was comparing bboxes between yolo detections and supervision detections. The association is not a 1to1 because, for example, not always the number of yolo detections is the same of sv ones.
How can this association be done?
Thanks again!
If you use the detections returned from tracker.update_with_detections(detections)
and the Detections
object has segmentation masks, then the segmentation masks from the model will be retained and have a tracker_id assigned to them.
Unfortunately, the tracker does not support Keypoints right now. From what you are describing, it sounds like you would want to use a yolo-pose model which returns bboxes and keypoints, and you would want to track the objects. This may be something we add, but for now I have a somewhat hacky idea of how you may be able to do this:
results = model(frame, imgsz = 1280,verbose=False)[0]
pre_track_detections = sv.Detections.from_ultralytics(results)
keypoints = sv.KeyPoints.from_ultralytics(results)
post_track_detections = byte_tracker.update_with_detections(pre_track_detections)
pre_track_bounding_boxes = pre_track_detections.xyxy
post_track_bounding_boxes = post_track_detections.xyxy
ious = sv.tracker.byte_tracker.matching.box_iou_batch(pre_track_bounding_boxes, post_track_bounding_boxes)
iou_costs = 1 - ious
matches, _, _ = sv.tracker.byte_tracker.matching.linear_assignment(iou_costs, 0.5)
post_track_keypoints = sv.KeyPoints.empty()
post_track_keypoints.xy = np.empty((len(post_track_detections), keypoints.xy.shape[1], 2), dtype=np.float32)
post_track_keypoints.class_id = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.confidence = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.data = keypoints.data
for i_detection, i_track in matches:
post_track_keypoints.xy[i_track] = keypoints.xy[i_detection]
post_track_keypoints.class_id[i_track] = keypoints.class_id[i_detection]
post_track_keypoints.confidence[i_track] = keypoints.confidence[i_detection]
This will make it so that the keypoints in post_track_keypoints have the same index as their corresponding bounding box in post_track_detections. Its kinda hacky, but it should work. I also have a colab notebook that demonstrates it here
Search before asking
Question
I've been using supervision, its tracker, annotators, ... Nice work!! However I've noticed that, doing object detection with yolov8, bboxe shape from ultralytics are changed by supervision even though it refers to the same detection. The following screenshot shows a detected object provided by YOLO, ultralytics.Result (before doing
supervision_tracker.update(results[0])
and after parsing it tosupervision_tracker
.The bboxes are diferent. I expect they shouldn't...
Can this bbox shape change be removed? I would like to keep original bbox shape.
Thanks!!
Additional
No response