Use Norfair with a bbox obtained with selectROI from OpenCV

moooises commented 1 year ago

Describe the situation you are working on I have been trying to use Norfair with a bounding box made manually by myself using selectROI from OpenCV. I made the conversion from selectROI to Norfair detection and with the help of a previus issue from this project I could prepare the tracker class to detect the bbox properly. My intention was to send the Norfair Detection I converted from selectROI to the update function of the tracker class and use the estimate attribute from the TrackedObject class returned by this function as the bbox for the Norfair Detection in the next frame, but the update function always returns a TrackedObject with the same points I had in the Norfair Detection I sent to the update function. So, when I draw the rectangle in the video, It is always drawn in the same position.

Describe what is it that you need help with I need help understanding why the update function always returns a TrackedObject with its estimate points equals to the bbox I had in the previous detectión and what can I do to solve it.

Additional context _I leave here an output video with my problem and some of the code I made. It is built up from the Yolov7 demo._


def yolo_detections_to_norfair_detections(
    bbbox_xywh, convert, score=90.0, labelSelected=1
) -> List[Detection]:
    norfair_detections: List[Detection] = []

    if convert:
    # This is to convert the bbox format xywh from selectROI to xyxy format
        bbox = np.array(
            [
                [bbbox_xywh[0], bbbox_xywh[1]],
                [bbbox_xywh[0]+bbbox_xywh[2], bbbox_xywh[1]+bbbox_xywh[3]],
            ]
        )
    else:
        bbox = np.array(
            [
                [bbbox_xywh[0], bbbox_xywh[1]],
                [bbbox_xywh[2], bbbox_xywh[3]],
            ]
        )
    scores = np.array(
        [score, score]
    )
    norfair_detections.append(
        Detection(
            points=bbox, scores=scores, label=labelSelected
        )
    )

    return norfair_detections

args.files=["TV_2022_10_24_11_04_47.mpeg"]

for input_path in args.files:
    video = Video(input_path=input_path)

    distance_function = "iou_opt" if args.track_points == "bbox" else "euclidean"

    distance_threshold = (
        DISTANCE_THRESHOLD_BBOX
        if args.track_points == "bbox"
        else DISTANCE_THRESHOLD_CENTROID
    )

    tracker = Tracker(
        hit_counter_max= 150,
        distance_function=distance_function,
        distance_threshold=distance_threshold,
        initialization_delay =0,
    )

    firstFrame=True

    try:

        for frame in video:

            if firstFrame:
                bbox_xywh=cv2.selectROI("Select object to track", frame)
                detections = yolo_detections_to_norfair_detections(
                    bbox_xywh, convert=firstFrame
                )
                firstFrame=False
            else:
                points = tracked_objects[0].estimate
                detections = yolo_detections_to_norfair_detections(
                    [points[0,0],points[0,1],points[1,0],points[1,1]], convert=firstFrame
                )

            tracked_objects = tracker.update(detections=detections)

            if args.track_points == "centroid":
                norfair.draw_points(frame, detections)
                norfair.draw_tracked_objects(frame, tracked_objects)
            elif args.track_points == "bbox":
                norfair.draw_tracked_boxes(frame, tracked_objects)

            #cv2.imshow('stream', frame )
            video.write(frame)

facundo-lezama commented 1 year ago

Hi @moooises, thanks for the detailed explanation.

Norfair is designed to be what is called a tracker by detection, meaning that it expects to have detections to be able to track objects. The motion estimation is done by a Kalman Filter that expects detection evidence to estimate new positions for the objects. There's no visual information included in the estimation, so it isn't possible to estimate future positions based on a single detection (the first one defined by hand), it is necessary to give a couple of detections so the positions and velocities can be estimated. This being said, even if you feed the Tracker with a couple of detections at first, it isn't possible to estimate the positions of the objects through the rest of the video because Norfair's Tracker expects to be fed detections at least once in a while.

Norfair is designed to allow some frame skipping in case the user needs some extra speed, and it also allows for some missing detections, but it heavily relies on having detections to track objects.

Can you mention a bit more about the use case? Why do you need to select the object yourself?

moooises commented 1 year ago

Hi @facundo-lezama, thanks for the quick response.

I'm interested in having both application, tracking with an object dectector and with a box made by hand. I'm planning to deploy them in a camera with motion sensor and it want to see how to make the camera moves following a tracked object. Because I have been working using Norfair for the past month, I was trying to make it work using Norfair too. But it is ok, I will just use an OpenCV tracker for this case.

Thank you so much.

facundo-lezama commented 1 year ago

Let us know how it goes! Maybe this is something we can discuss internally and see if it fits what we want for Norfair.

tryolabs / norfair

Use Norfair with a bbox obtained with selectROI from OpenCV #240