No appearance embedding is used?

ifzhang / ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box

MIT License

4.64k stars 884 forks source link

No appearance embedding is used? #47

Closed JunweiLiang closed 2 years ago

JunweiLiang commented 2 years ago

Hi, according to this code example: https://github.com/ifzhang/ByteTrack#combining-byte-with-other-detectors There is no appearance embedding input for the tracker. Could you confirm that your tracking algo is better than DeepSORT and TMOT (https://github.com/Zhongdao/Towards-Realtime-MOT) that use appearance embedding without using it to compute similarities between tracklets? I have not check the paper yet just looking for a quick answer. Many thanks! :)

Mohamed209 commented 2 years ago

in the paper , authors mentioned that embedings are not important for their algorithm , from my reading to the paper and using the code base in my projects , kalman uses only location and motion to predict new ids

JunweiLiang commented 2 years ago

@ifzhang Any insights would be greatly appreciated. I'm really surprised that appearance features do not help.

qwe1444 commented 2 years ago

We replaced the detector, use darknet-yolov4 ,compared Bytetrack and Deepsort at MOT20 train-set，deepsort better than Bytetrack. Whether appearance embedding is useful is still a question

ifzhang commented 2 years ago

BYTE can be combined with embedding and sometimes can achieve better results when only using Kalman. You can see some example in tutorials, like FairMOT and CSTrack. In some videos with fast camera motion or low fps, embedding is more accurate than Kalman. We do not use embedding because we want to get faster inference speed. BTW, embedding tends to perform better on the training set (e.g. MOT20) because it can overfit the training set. However, it will drop performance on the test set because of the domain gap.

JunweiLiang commented 2 years ago

@ifzhang Thanks! That is very insightful.