Add tracking for KeyPoints

rolson24 commented 4 months ago

Search before asking

[X] I have searched the Supervision issues and found no similar feature requests.

Description

Right now there is no way to track objects that have keypoints associated with them, because the tracker does not have a way to track keypoints. This feature would be able to track objects that keypoints are associated with even if there are multiple options. Ideally it would simply use the existing ByteTrack module to track the objects' bounding boxes and then keep the keypoints associated with that tracked object. Note this is different than tracking each individual keypoint, which would require an entirely different tracker.

Use case

This is important for many different applications where tracking keypoints through a video can provide some important information. For example for sports science if two players a playing basketball and you want to analyze the movement of players, you would need to track the keypoints of the two players separately.

Additional

I see several ways this could be implemented.

Option 1: Add keypoints to Detections and change all the from_model() functions Add keypoints as another possible attribute to the Detections object, similar to the mask attribute. This would most likely involve adding keypoints to Detections and adding from_mediapipe() to the Detections class and modifying all of the other from_model() functions to support KeyPoints. Then the tracker could be used as normal on these detections objects.

Option 2: Add keypoints to Detections after a detections object has been created The same as option 1, but instead of modifying all of the from_model() functions, make it so that the keypoints attribute is None unless the keypoints object were added to the existing Detections object. This would require the indices of the keypoints to exactly match the associated detection boxes. This could work with models that don't output bounding boxes by creating the boxes from the keypoints. Then the tracker could be used as normal.

Option 3: Add bounding boxes and object confidence scores to the KeyPoints class We could add bounding boxes and object confidence scores to the KeyPoints class in the same way as Detections. For the ultralytics pose models this would be easy as they are included as outputs. For the other models this could be implemented by creating a bounding box from the keypoints of each object, and confidence scores as an average of the keypoints confidence values. Then the KeyPoints object could simply be sent into the object tracker. It would require a small amount of modification to the tracker, but would be relatively simple on the whole. It would be redundant to have KeyPoints and Detections have some of the same information.

Option 4: Do this hacky thing I don't like this option because it is ugly and inefficient and is slightly confusing, but it works right now without any changes.

results = model(frame, imgsz = 1280,verbose=False)[0]
pre_track_detections = sv.Detections.from_ultralytics(results)
keypoints = sv.KeyPoints.from_ultralytics(results)
post_track_detections = byte_tracker.update_with_detections(pre_track_detections)

pre_track_bounding_boxes = pre_track_detections.xyxy
post_track_bounding_boxes = post_track_detections.xyxy

ious = sv.tracker.byte_tracker.matching.box_iou_batch(pre_track_bounding_boxes, post_track_bounding_boxes)
iou_costs = 1 - ious
matches, _, _ = sv.tracker.byte_tracker.matching.linear_assignment(iou_costs, 0.5)

post_track_keypoints = sv.KeyPoints.empty()

post_track_keypoints.xy = np.empty((len(post_track_detections), keypoints.xy.shape[1], 2), dtype=np.float32)
post_track_keypoints.class_id = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.confidence = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.data = keypoints.data

for i_detection, i_track in matches:
    post_track_keypoints.xy[i_track] = keypoints.xy[i_detection]
    post_track_keypoints.class_id[i_track] = keypoints.class_id[i_detection]
    post_track_keypoints.confidence[i_track] = keypoints.confidence[i_detection]

Are you willing to submit a PR?

[x] Yes I'd like to help by submitting a PR!

AyuK03 commented 1 month ago

Hi, it is my first time contributing to the project and I would like to work on this issue :)

onuralpszr commented 1 month ago

Hi, it is my first time contributing to the project and I would like to work on this issue :)

Please don't send message to every single issues and since it is your first time to contribute please read the issue and try to understand and try to do in fork and show us google collab and we can proceed. So a little bit slow down please :)

Thank you.

AHB102 commented 4 weeks ago

@onuralpszr @rolson24 Any other thoughts? I'm leaning towards option 1. It's tedious, but I think it'll be better in the long run.

LinasKo commented 3 weeks ago

How about adding a function to convert a subset of KeyPoints to Detections? The user would only need 1 extra code block:

https://colab.research.google.com/drive/1VC0txQ6sZLkvnqH6J6opxzdf63oULKFT?usp=sharing

@rolson24, @onuralpszr, @SkalskiP, any thoughts?

LinasKo commented 2 weeks ago

Keypoint tracking is added via conversion to Detections: https://github.com/roboflow/supervision/pull/1658

Docs: https://supervision.roboflow.com/develop/how_to/track_objects/#tracking-key-points

However, I suggest we keep this issue open as it's not equivalent to tracking the keypoints directly.

roboflow / supervision