tryolabs / norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.
https://tryolabs.github.io/norfair/
BSD 3-Clause "New" or "Revised" License
2.39k stars 243 forks source link

How to start? #294

Closed utility-aagrawal closed 7 months ago

utility-aagrawal commented 8 months ago

Hi All,

I have just discovered this library and think it might be useful for face tracking application I am working on. I already have a working detector algorithm. I want to understand how do I integrate that with norfair to enable tracking.

I saw an example code here: https://tryolabs.github.io/norfair/2.2/reference/

I want to understand what do my detector needs to return for this piece of code: detections = detector(frame)

When I read the definition of Detection class here: https://github.com/tryolabs/norfair/blob/009a1b171ab14336d79d7b7b02dfa5f45066c79e/norfair/tracker.py#L748, it is expecting points as a (num_points, num_dimensions) shaped array...where num_dimensions is 2 or 3. So, it's not the bounding box coordinates that I should pass for num_dimensions?

Please advise. Appreciate your help!

aguscas commented 8 months ago

Hello! You need to convert the detections of your model to a list of Detection instances. So in your case, since you are working with bounding boxes, you can consider it as 2 points (the corners of the bounding box).

For example, if one of your detections has coordinates bbox = np.array([[x0, y0], [x1, y1]]), you can define a norfair detection as detection = Detection(bbox).

As you can see in the definition you linked, you can also provide the scores (the scores argument), and also class identifiers (the label argument) which might be useful if you have more than one type of class in your use case.

To put a super explicit example, if you have a bounding box with coordinates [[125, 312], [214, 489]], with score 0.75, and class id is '3', a norfair detection like the following would correspond to an object like this

score = np.array([0.75, 0.75])
bbox = np.array([[125, 312], [214, 489]])
label = 3

detection = Detection(bbox, scores = score, label = label)

You will need to make a list of Detection instances in the current frame, and pass it to the Tracker.update method in order to do the tracking.

Maybe looking at the code in some of our demos with particular detectors can help, like the demo with YoloV5 which tracks bounding boxes (as in your case) whenever the argument --track-points is set to 'bbox'.

Don't hesitate to ask if you need further guidance.

utility-aagrawal commented 7 months ago

Thanks for the detailed response, @aguscas ! I really appreciate it! That makes sense.

utility-aagrawal commented 7 months ago

@aguscas , just a small clarification! do I need to provide score for every point rather than the whole bounding box? I see that in your example we have just one bounding box but two scores. Thanks!

aguscas commented 7 months ago

@utility-aagrawal , yep, for the moment that's the way norfair works. The reason behind that is because norfair can also work for keypoint detection, where each point might have a different score.

Maybe that is something that we should change in the future, so that one doesn't need to do that whenever you only have a single score for the whole object, it can be a little annoying to have to define the array of scores.

utility-aagrawal commented 7 months ago

Makes sense, @aguscas ! Thanks so much for the quick turnaround!

aguscas commented 7 months ago

I closed this issue, but feel free to open another one if you need more help