nwojke / deep_sort

Simple Online Realtime Tracking with a Deep Association Metric
GNU General Public License v3.0
5.19k stars 1.46k forks source link

Cross-video Feature Matching by re-using old tracks.features #254

Open MaxLievense opened 3 years ago

MaxLievense commented 3 years ago

Hi guys,

I am trying to adjust the code that saves the features of a track, to store them at the end of the video,to be re-loaded in the next video. And the crux is to have it unsupervised.

My current plan was to: Adjust the "Dead"-status of a track to "Saved" where the feature extractor will check the dead tracks' feature to link to one of the unmatchable tracks, before initializing a new track. This would be an alternative to post Image-to-Video referencing of track sequences.

From my understanding of the code:

  1. The first cost matrix is made by NearestNeighborMatching of the features and targets (def>function)
  2. It is converted to a gated matrix (def>function, leading to the gating_distance) makes the cost matrix based on the appearance of the subjects.
  3. The matching_casade (def>function, with the min_cost_matching above) does the distribution of the cost matrix.

For this implementation to work I had a few questions:

  1. Is it possible to adjust the weights/costs that are required to create a new track. Meaning it will prefer to use a "Saved"-track instead of initializing a new one.
  2. Summerize the feature extractor to collapse the feature-list to some key features (like the front, side, back). Otherwise, a detector to classify the direction of the subject would also work.
  3. How can I increase the dimensions of the feature extractor from 128 to more?
  4. Is there another Filter/Matcher that would be more suitable for such a task.

Any help would be appreciated! Kind regards, Max

studentbrad commented 3 years ago

I'll address these in order.

  1. It is possible to do this. However, a "saved" track is usually referred to as a "lost" track. However I doubt this will give you the functionality you are looking for. It is preferred to have 4 states: new, tracked, lost and removed. Tracks that are "tracked" take precedence; then "lost" and finally "new".
  2. I do not know enough about the feature extractor here.
  3. You can do this using a different network model. Models have a predetermined feature size.
  4. A minimum cost perfect matching is by definition "perfect". That means that the sum of the weights of the matched rows and columns is minimal.
lakshaydulani commented 2 years ago

@MaxLievense I too want to achieve the same.. do u have any solution?