tryolabs / norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.
https://tryolabs.github.io/norfair/
BSD 3-Clause "New" or "Revised" License
2.34k stars 237 forks source link

Any tips on how to use Norfair for re-identification accross cameras #312

Closed GeorgePearse closed 2 months ago

GeorgePearse commented 3 months ago

I work on a project with 15 cameras, where each is on a neighbouring bit of space.

We send a notification to the client the first time we see an object. Normally an object will only appear in one of those cameras, but it would be nice to protect against a case where it moves across all 15, and prevent sending 15 notifications. How could we go about implementing this?

I can't really workout how to think about it. Re-identification is key, but how do we share a kind of re-identification pool over all of the streams.

aguscas commented 3 months ago

Hello! We are working on multi camera support for Norfair in this pull request. That pull request is still waiting to be reviewed before merging it, but that may take a while since the rest of the team is a little busy currently. Either way, if you don't want to wait until then you can try that PR yourself, but remember that it hasn't yet been tested by the rest of the team since I implemented it, so it is possible that you might run into some problems. Don't hesitate to ask me if you need any help with that.

I made a demo, where the user first uses a UI to associate the coordinates between the different videos (to create a common reference frame for all the videos), and use that information to match the trackers. Since you mention that there is practically no overlap between the regions recorded by your cameras, you should only compare embeddings of the objects (i.e: how they look) and not so much their spatial position.

For that you will need to do some adjustments to that demo, like removing the parts where I set and use the initial_transformations variable (which I use to define the common reference frame), and also the distance function used by the MultiCameraClusterizer should only use the embeddings and not the spatial position (in that demo, you can see I defined the clusterier_distance, which uses the spatial position using the normalized_foot_distance, and when they were close I looked at the embeddings with the embedding_distance function).

The output of said demo aggregates all the videos to a single video, showing the bounding boxes of the tracked objects on each video, with the same id and color when they correspond to the same real object. Here is an example I made with that script using footage from the EPFL dataset. I am providing you this because I haven't yet put a gif in the README showing an example of the expected output.

https://github.com/tryolabs/norfair/assets/70915567/bc3b0dde-6903-475d-aabe-9cd38bc41aab