tryolabs / norfair

Lightweight Python library for adding real-time multi-object tracking to any detector.
https://tryolabs.github.io/norfair/
BSD 3-Clause "New" or "Revised" License
2.34k stars 237 forks source link

reid - how to use reid? #296

Closed utility-aagrawal closed 5 months ago

utility-aagrawal commented 5 months ago

Hi,

I looked at the demo and explanation here: https://github.com/tryolabs/norfair/tree/master/demos/reid

At high-level, I understand what's going on but is it possible for you to explain what's happening in embedding_distance() here?: https://github.com/tryolabs/norfair/blob/009a1b171ab14336d79d7b7b02dfa5f45066c79e/demos/reid/src/demo.py#L15

I want to do something similar for face reid and I have a way of representing faces using embeddings. I want to understand what are you doing here: what embeddings are you comparing against what? what kind of distance are you using?, What are matched_not_init_trackers, unmatched_trackers, snd_embedding?, etc.

Appreciate your help with this! Thanks!

aguscas commented 5 months ago

Hello. I will provide a short and a long answer because I am not sure in what detail you want to know what's going on. You don't need to worry much about what I write in the long answer, since a lot of that is just stuff that Norfair does on it's own behind the veil, but I thought it could make a little more sense to explain the names of these variables you were asking.

The short answer:

The embedding_distance in that demo is just a distance between two TrackedObjects (both matched_not_init_trackers and unmatched_trackers are TrackedObject instances). I agree with what you are currently thinking, the plural in the name doesn't make any sense, these aren't lists of many TrackedObject instances, but just one TrackedObject instance each of these variables.

In this case we are using the color histogram as the embedding (that we create in this line when defining the embedding)), and the distance function is 1 - correlation of the histograms as you can see in this line.

The snd_embedding variable in that function is just the embedding of one of the past detections of the unmatched_trackers variable, and detection_fst is one of the past detections of the matched_not_init_trackers variable.

The long answer:

As you can see in the Detection class, you can provide an embedding to the detections. The Tracker will be storing embeddings of the associated TrackedObjects that it generates.

We don't store all the embeddings of each detection associated to a tracked object, since that could require lots of memory to store all the embeddings, but you can tell the Tracker how many embeddings should it have available at once for each tracker by setting the past_detections_length argument to whatever number you would like.

Now I will try to explain this embedding_distance thing. The Tracker has two distances, which you can see in it's arguments when defining the Tracker instance; these are the distance_function and the embedding_distance.

The distance_function:

This one is used to match Detection instances with TrackedObject instances. So whenever you call the Tracker.update method passing your detections, norfair will use that distance_function to see which detection corresponds to which object you were already tracking, and which detections might be related to a new object you have never seen.

It typically uses positional information (i.e: where the TrackedObject and the Detection are located in the image), specially considering that for TrackedObject instances that were created just a few frames ago you might not have that many embeddings available to compare it with a Detection, since it was just created. Of course, you could use both positional information and information from the embeddings if you wanted inside that distance_function, but is not the way we normally use it.

The embedding_distance:

This one is used to match two different TrackedObject instances. The thing is that the previous method with the distance_function might not work perfectly, and we might need to check if two different TrackedObject instances could actually correspond to the same object in real life.

Lets put an example: if the object suddenly and abruptly accelerates and then the corresponding Detection and TrackedObject are far from each other. In that case, the associated TrackedObject will not match with that (or probably any) Detection, becoming an "unmatched_tracker", and will remain that way until it matches with a Detection again, or until it is destroyed by the Tracker if enough frames without matching anything have happened.

On the other hand, since that Detection didn't match with any previous TrackedObject, the Tracker will create a new TrackedObject. This new TrackedObject will not be yet be returned by the Tracker when you call the Tracker.update method, since the Tracker waits to have evidence for many frames that the object actually exists and it wasn't just a random false positive of your detector. Therefore, this new TrackedObject is what we called a "matched_not_init_tracker", and will remain as that until the Tracker 'initializes' it (i.e: has enough evidence for many frames to consider it a real object and starts returning it in the Tracker.update method), or until the Tracker destroys it if it doesn't match future detections.

One might be able to recognize that amatched_not_init_tracker might actually correspond to anunmatched_tracker by using the embedding_distance, to then 'merge' the new uninitialized tracker with the old unmatched tracker to a single TrackedObject instance with the id of the old one.

This embedding_distance usually uses information from the embeddings, since positional information might not be very useful because the TrackedObject was lost due to the erratical movement of the object.

The demo

Now I will answer your specific questions from that demo. In that demo, we are using just a color histogram of the bounding box (you can see it in the function get_hist, that is called in this line when defining the embedding).

The snd_embedding variable (read it as 'second embedding') in our embedding_distance, is an embedding (color histogram) of what we called the old unmatched tracker in the previous explanation. So first we try to get the embedding of the last detection we had for that unmatched tracker, if we don't have an embedding available in that detection, then we start looking at the list of past detections (remember that we store only a few of those) starting from the latests and see if we have an embedding available to put in that variable.

Similarly, we look for past embeddings available for our brand new not-yet-initialized TrackedObject, and compare them with the embedding of the old unmatched Tracked Object. The distance we use in that demo is the correlation of the color histograms (as you can see in this line, where we compare the histograms with cv2.HISTCMP_CORREL. In fact, to be precise we use 1-correlation, so that when the histograms match perfectly (correlation=1) the distance is 0.

Of course that is just a simple example. You can define your own embedding_distance for your own embeddings.

utility-aagrawal commented 5 months ago

@aguscas , Thanks a lot for such a great explanation! It makes perfect sense! I'll use this to come up with the logic for my tracking. Closing the issue for now and will open a new one later if I have any further questions.

Again, I really appreciate the time you took to explain it! Have a great day!