Problems with tracker.update_with_detections(detections)

CodingMechineer commented 5 months ago

Search before asking

[X] I have searched the Supervision issues and found no similar bug report.

Bug

Somehow, I loose predicted bounding boxes in this line:

tracker.update_with_detections(detections)

In the plot from Ultralytics, everything is fine. Though, after the line above gets executed, I loose some bounding boxes. In this example, I loose two.

That's the plot from Ultralytics, how it should be:

That's the plot after the Roboflow labling, some predictions are missing:

Can somebody help me with this issue?

Environment

Supervision 0.20.0
Python 3.12.3
Ultralytics 8.2.18

Minimal Reproducible Example

Code:

import cv2
import supervision as sv
from ultralytics import YOLO

model_path = "path/to/your/model.pt"
video_path = "path/to/your/video.mp4"

cap = cv2.VideoCapture(video_path)
model = YOLO(model_path)
box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()
tracker = sv.ByteTrack()

while True:
    ret, frame = cap.read()

    results = model(frame, verbose=False)[0]
    print(f"CLS_YOLO-model: {results.boxes.cls}")

    results_2 = model.predict(frame,     
                        show=True, # The plot from the Ultralytics library
                        conf = 0.5,
                        save = False,
                        )

    detections = sv.Detections.from_ultralytics(results)
    print(f"ClassID_Supervision_1: {detections.class_id}") # Between this and the next print, predictions are lost

    detections = tracker.update_with_detections(detections) # The detections get lost here

    labels = [
        f"{results.names[class_id]} {confidence:0.2f}"
        for confidence, class_id
        in zip(detections.confidence, detections.class_id)
    ]

    print(f"ClassID_Supervision_2: {detections.class_id}") # Here two predictions from the Ultralytics model are lost

    annotated_frame = frame.copy()

    annotated_frame = box_annotator.annotate(
        annotated_frame,
        detections
        ) 

    labeled_frame = label_annotator.annotate(
        annotated_frame,
        detections,
        labels
        )

    print(f"ClassID_Supervision_3: {detections.class_id}")
    print(f"{len(detections)} detections, Labels: {labels}", )

    cv2.imshow('Predictions', labeled_frame) # The with Roboflow generated frame

cap.release()
cv2.destroyAllWindows()

Prints in console:

CLS_YOLO-model: tensor([1., 1., 1., 1.], device='cuda:0') --> Class ID's from the predicted bounding boxes ClassID_Supervision_1: [1 1 1 1] --> Converted into Supervision ClassID_Supervision_2: [1 1] --> After the tracker method class ID's are lost ClassID_Supervision_3: [1 1] 2 detections, Labels: ['Spot 0.87', 'Spot 0.86']

Additional

No response

Are you willing to submit a PR?

[ ] Yes I'd like to help by submitting a PR!

LinasKo commented 5 months ago

Hi @CodingMechineer 👋

Let's do one quick test - does installing supervision==0.21.0.rc5 change anything?

SkalskiP commented 5 months ago

@CodingMechineer, we accidentally shipped a tracking bug in supervision==0.20.0. Try using supervision==0.19.0 or supervision==0.21.0.rc5 pre-release.

CodingMechineer commented 5 months ago

@LinasKo @SkalskiP I installed version supervision==0.21.0.rc5 and supervision==0.19.0. Though with both versions I have the same problem.

Top: YOLO predictions Bottom: Supervision tracker Screenshot 2024-05-21 153420

SkalskiP commented 5 months ago

@CodingMechineer Could you share with us the exact version of the model and the video file you are using?

CodingMechineer commented 5 months ago

@SkalskiP Sure! Please let me know if there is an issue on my end. I zipped the video, the model, the code and a requirements.txt file. Unfortunately, the file sizes from the video and the model are too big. Thus, GitHub doesn't let me upload everything.

https://1drv.ms/u/s!AjTS76M8DCeYm8djbuYtiGXfXFNvsQ?e=GleN38

LinasKo commented 5 months ago

Hi @CodingMechineer :wave:

Tracker uses detections overlap and motion model prediction to estimate which detection represent the same object in sequential frames. It then filters out what it can't match. While the details are a bit complicated, a quick way to influence the result is to increase the object area shown to the tracker.

So, my quick suggestion: check if padding the boxes solves your problem.

That means, insert detections.xyxy = sv.pad_boxes(detections.xyxy, px=10, py=10) between the calls to from_ultralytics and update_with_detections.

Here's how it looks on my end. This way, all holes are detected, even after tracking.

Does that solve your problem? :wink:

CodingMechineer commented 5 months ago

Unfortunately not, some spots and peanuts are still not tracked.

LinasKo commented 5 months ago

That's unfortunate. Here's the next steps to try:

I assume you're already using supervision==0.21.0.rc5 - only later versions have pad_boxes. If not, you should switch to supervision==0.21.0.rc5.
Next, trying out a few values of parameters might help, especially since I think padding already captures 99% of the cases (on my machine the padding worked really well, applied the same way you did)
1. Try changing px and py in pad_boxes
2. Try setting a different track_activation_threshold and minimum_matching_threshold in tracker. If the expected FPS is different than 30, you should set it too, as a tracker argument.

Are you always running this on videos, or on a stream too? If both, I wonder if it performs similarly on the live stream and the video of the same stream.

SkalskiP commented 5 months ago

@LinasKo, do you have any idea why this happens?

LinasKo commented 5 months ago

@SkalskiP, no. I dug for an hour or so, plotted sequential detections (there's typically >50% overlap, yet they disappear). I played with some values, but I'd need to plot/print out steps of the algorithm to learn how it sees the world.

SkalskiP commented 5 months ago

@LinasKo this should not happen. I'm worried because I have no idea why it's happening. @rolson24 would you have time to take a look?

CodingMechineer commented 5 months ago

That's unfortunate. Here's the next steps to try:

I assume you're already using supervision==0.21.0.rc5 - only later versions have pad_boxes. If not, you should switch to supervision==0.21.0.rc5.

Next, trying out a few values of parameters might help, especially since I think padding already captures 99% of the cases (on my machine the padding worked really well, applied the same way you did)

Try changing px and py in pad_boxes

Try setting a different track_activation_threshold and minimum_matching_threshold in tracker. If the expected FPS is different than 30, you should set it too, as a tracker argument.

Are you always running this on videos, or on a stream too? If both, I wonder if it performs similarly on the live stream and the video of the same stream.

I may run this on a stream in the future. Currently, I only run it on video files. To do the same task, I also tried the Ultralytics library. That works completely fine and I continue with that.

The code looks something like this:

from ultralytics import YOLO

model_path = 'best.pt'
video_path = '../001 - DATA/099 - Test_videos/Test_video_0.avi'

cap = cv2.VideoCapture(video_path)
model = YOLO(model_path)

while cap.isOpened():
  success, frame = cap.read()
  results = model.track(frame, persist=True)

  if results[0].boxes.id is not None:
    boxes = results[0].boxes.xyxy.cpu()
    track_ids = results[0].boxes.id.int().cpu().tolist()
    clss = results[0].boxes.cls.cpu().tolist()
    confs = results[0].boxes.conf.cpu().tolist()

  # Do all the plotting and processing

Maybe this can help you. Please let me know if I can do anything else for you.

SkalskiP commented 5 months ago

@CodingMechineer btw if you use model.track in ultralytics, you can still use detections = sv.Detections.from_ultralytics(results) and that tracker_id will be extracted from result object.

SkalskiP commented 5 months ago

@LinasKo and @CodingMechineer, is that issue still active?

LinasKo commented 5 months ago

Yup, we'll need to look at this in the future.

rolson24 commented 5 months ago

@LinasKo this should not happen. I'm worried because I have no idea why it's happening. @rolson24 would you have time to take a look?

Hi @SkalskiP, Sorry I have been super busy with school. I can take a look at this and try to see what is going on.

rolson24 commented 5 months ago

Hi @CodingMechineer, @SkalskiP, and @LinasKo

I took a look at it with @CodingMechineer's code and it seems like the tracker is working as expected. It unfortunately is failing for @CodingMechineer because the motion predictor (kalman filter) in the tracker uses the first 2 frames of a track to determine the speed and direction of an object, and so for the first association between the initial detection frame and the second detection frame the tracker uses the overlap between the two frames, NOT between the prediction and the second frame. In this specific video, the objects move very quickly and so for the first 2 frames of some tracks, there is almost no overlap (the ones that are not tracked are <30%, needs to be >30%), meaning no track gets established. This then means the tracker must start over with an entirely new track for that object because it could not establish a motion model, and so for the next frame it has no hope of the overlap of frame 1 and frame 3 being more than 30% in this example.

I improved the performance for this example by setting the minimum_matching_treshold to 0.9 when initializing ByteTrack() and I further improved the performance by adjusting the parameter for activating a new track to be 0.9 (initial 2 frame overlap only needs to be greater than 10% rather than 30%) This second change is in the source code and was mainly just to test my theory of what is happening and I would not recommend changing it.

For your specific example @CodingMechineer, if the tracking performance is essential to your project, I believe you would either need to record at a higher framerate or you would need to slow down the device being used to sort peanuts. Both these options would reduce the amount an object moves between frames. The last thing we could try is to initialize the motion model to be going in general from left to right if this specific location is the only place you need to run this code on. This would allow the tracker to better pick up on objects in the first two frames. Unfortunately it would require changing the source code a bit and messing around with the kalman filter, something that I am willing to help with, but we would probably not be able to put into the supervision API.

@LinasKo @SkalskiP This issue seems to come from how ByteTrack is designed to be flexible for varied and unexpected tracking scenarios. If we wanted to fix the tracking for these types of repetitive, predictable computer vision tasks, it would be better to design a second tracker that can better handle high-speed and predictable types of motion for tasks like this.

CodingMechineer commented 5 months ago

Thank you for your detailed investigation @rolson24!

I made the same observation regarding the framerate as you explained. With the video from the example, I had the stated problem only with the Roboflow library but not with Ultralytics. Though, when the device is running faster with the same framerate, I had the same problems with the Ultralytics library. Thus, I must make sure the movements from the objects between the frames is small enough, so that the tracking works satisfyingly. Probably the object movement between the frames was between a threshold so that it worked with Ultralytics but not with Roboflow.

In summary, I need to make sure my framerate is high enough so that the object overlap is big enough and the tracking works accordingly. Hence, there is no need to change the source code.

Thanks everybody for your help!

roboflow / supervision