Closed utility-aagrawal closed 9 months ago
Hello. I will provide a short and a long answer because I am not sure in what detail you want to know what's going on. You don't need to worry much about what I write in the long answer, since a lot of that is just stuff that Norfair does on it's own behind the veil, but I thought it could make a little more sense to explain the names of these variables you were asking.
The embedding_distance in that demo is just a distance between two TrackedObjects
(both matched_not_init_trackers
and unmatched_trackers
are TrackedObject
instances). I agree with what you are currently thinking, the plural in the name doesn't make any sense, these aren't lists of many TrackedObject
instances, but just one TrackedObject
instance each of these variables.
In this case we are using the color histogram as the embedding (that we create in this line when defining the embedding)), and the distance function is 1 - correlation of the histograms
as you can see in this line.
The snd_embedding
variable in that function is just the embedding of one of the past detections of the unmatched_trackers
variable, and detection_fst
is one of the past detections of the matched_not_init_trackers
variable.
As you can see in the Detection class, you can provide an embedding to the detections. The Tracker will be storing embeddings of the associated TrackedObjects that it generates.
We don't store all the embeddings of each detection associated to a tracked object, since that could require lots of memory to store all the embeddings, but you can tell the Tracker how many embeddings should it have available at once for each tracker by setting the past_detections_length
argument to whatever number you would like.
Now I will try to explain this embedding_distance
thing. The Tracker
has two distances, which you can see in it's arguments when defining the Tracker
instance; these are the distance_function
and the embedding_distance
.
distance_function
:This one is used to match Detection
instances with TrackedObject
instances. So whenever you call the Tracker.update
method passing your detections, norfair will use that distance_function
to see which detection corresponds to which object you were already tracking, and which detections might be related to a new object you have never seen.
It typically uses positional information (i.e: where the TrackedObject
and the Detection
are located in the image), specially considering that for TrackedObject
instances that were created just a few frames ago you might not have that many embeddings available to compare it with a Detection, since it was just created. Of course, you could use both positional information and information from the embeddings if you wanted inside that distance_function
, but is not the way we normally use it.
embedding_distance
:This one is used to match two different TrackedObject
instances. The thing is that the previous method with the distance_function
might not work perfectly, and we might need to check if two different TrackedObject
instances could actually correspond to the same object in real life.
Lets put an example: if the object suddenly and abruptly accelerates and then the corresponding Detection
and TrackedObject
are far from each other. In that case, the associated TrackedObject
will not match with that (or probably any) Detection
, becoming an "unmatched_tracker
", and will remain that way until it matches with a Detection again, or until it is destroyed by the Tracker if enough frames without matching anything have happened.
On the other hand, since that Detection
didn't match with any previous TrackedObject
, the Tracker
will create a new TrackedObject
. This new TrackedObject
will not be yet be returned by the Tracker
when you call the Tracker.update
method, since the Tracker
waits to have evidence for many frames that the object actually exists and it wasn't just a random false positive of your detector. Therefore, this new TrackedObject
is what we called a "matched_not_init_tracker
", and will remain as that until the Tracker 'initializes' it (i.e: has enough evidence for many frames to consider it a real object and starts returning it in the Tracker.update
method), or until the Tracker
destroys it if it doesn't match future detections.
One might be able to recognize that amatched_not_init_tracker
might actually correspond to anunmatched_tracker
by using the embedding_distance
, to then 'merge' the new uninitialized tracker with the old unmatched tracker to a single TrackedObject
instance with the id of the old one.
This embedding_distance
usually uses information from the embeddings, since positional information might not be very useful because the TrackedObject
was lost due to the erratical movement of the object.
Now I will answer your specific questions from that demo. In that demo, we are using just a color histogram of the bounding box (you can see it in the function get_hist
, that is called in this line when defining the embedding).
The snd_embedding
variable (read it as 'second embedding') in our embedding_distance
, is an embedding (color histogram) of what we called the old unmatched tracker in the previous explanation. So first we try to get the embedding of the last detection we had for that unmatched tracker, if we don't have an embedding available in that detection, then we start looking at the list of past detections (remember that we store only a few of those) starting from the latests and see if we have an embedding available to put in that variable.
Similarly, we look for past embeddings available for our brand new not-yet-initialized TrackedObject, and compare them with the embedding of the old unmatched Tracked Object. The distance we use in that demo is the correlation of the color histograms (as you can see in this line, where we compare the histograms with cv2.HISTCMP_CORREL
. In fact, to be precise we use 1-correlation, so that when the histograms match perfectly (correlation=1) the distance is 0.
Of course that is just a simple example. You can define your own embedding_distance for your own embeddings.
@aguscas , Thanks a lot for such a great explanation! It makes perfect sense! I'll use this to come up with the logic for my tracking. Closing the issue for now and will open a new one later if I have any further questions.
Again, I really appreciate the time you took to explain it! Have a great day!
Hi,
I looked at the demo and explanation here: https://github.com/tryolabs/norfair/tree/master/demos/reid
At high-level, I understand what's going on but is it possible for you to explain what's happening in embedding_distance() here?: https://github.com/tryolabs/norfair/blob/009a1b171ab14336d79d7b7b02dfa5f45066c79e/demos/reid/src/demo.py#L15
I want to do something similar for face reid and I have a way of representing faces using embeddings. I want to understand what are you doing here: what embeddings are you comparing against what? what kind of distance are you using?, What are matched_not_init_trackers, unmatched_trackers, snd_embedding?, etc.
Appreciate your help with this! Thanks!