Closed JunweiLiang closed 2 years ago
in the paper , authors mentioned that embedings are not important for their algorithm , from my reading to the paper and using the code base in my projects , kalman uses only location and motion to predict new ids
@ifzhang Any insights would be greatly appreciated. I'm really surprised that appearance features do not help.
We replaced the detector, use darknet-yolov4 ,compared Bytetrack and Deepsort at MOT20 train-set,deepsort better than Bytetrack. Whether appearance embedding is useful is still a question
BYTE can be combined with embedding and sometimes can achieve better results when only using Kalman. You can see some example in tutorials, like FairMOT and CSTrack. In some videos with fast camera motion or low fps, embedding is more accurate than Kalman. We do not use embedding because we want to get faster inference speed. BTW, embedding tends to perform better on the training set (e.g. MOT20) because it can overfit the training set. However, it will drop performance on the test set because of the domain gap.
@ifzhang Thanks! That is very insightful.
Hi, according to this code example: https://github.com/ifzhang/ByteTrack#combining-byte-with-other-detectors There is no appearance embedding input for the tracker. Could you confirm that your tracking algo is better than DeepSORT and TMOT (https://github.com/Zhongdao/Towards-Realtime-MOT) that use appearance embedding without using it to compute similarities between tracklets? I have not check the paper yet just looking for a quick answer. Many thanks! :)