Considering Mahalanobis-distance when assigning detections to tracked objects

The authors of the original DeepSort paper combine both the Mahalanobis-distance (motion info from Kalman filters) and cosine-distance (similarity info from deep embeddings) in a weighted sum formula (eq. 5 in the paper):

ci,j = λd(1)(i, j) + (1 – λ)d(2)(i, j)

where d(1)(i, j) and d(2)(i, j) are the Mahalanobis and cosine-distance respectively. Then they say that "during our experiments we found that setting λ = 0 is a reasonable choice when there is substantial camera motion". Indeed, in the source code of the tracker, this lambda parameter is not even implemented and the distance of the Kalman-filter prediction is simply used to gate the cosine distances (I guess here: https://github.com/nwojke/deep_sort/blob/280b8bdb255f223813ff4a8679f3e1321b08cdfc/deep_sort/tracker.py#L99).

My use case is to track vehicles with a fixed position camera. In this scenario, considering also the prediction of the Kalman filter when calculating the distance matrix could be beneficial, as the cars might look similar, but they have also a quite predictable movement. At least it would be interesting to experiment with values of lambda different from zero and see if the tracking can be improved.

Any hints?

nwojke / deep_sort

Considering Mahalanobis-distance when assigning detections to tracked objects #282