timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
https://arxiv.org/abs/2101.02702
Apache License 2.0
516 stars 117 forks source link

The setting of `prev_query_embed` in `DeformableTransformer` #54

Open liuqk3 opened 2 years ago

liuqk3 commented 2 years ago

Hi, thanks for your great works!

I found that prev_query_embed of track query in deformable_transformer.py

https://github.com/timmeinhardt/trackformer/blob/df70fef0539dc6ebe8ed26bf1ce55dd6e8f87968/src/trackformer/models/deformable_transformer.py#L214

is set to zeros. However, the query_embed of detection query is learned end-to-end, which is in fact the postional embeddings. Why you do such settings? From the commented lines (line 215-220), it seems that you have tried different settings of prev_tgt and prev_query_embed. Does the performance differ a lot with these different settings?

timmeinhardt commented 2 years ago

The query_embed is the encoding for the decoder to be able to differentiate the object queries. We add a zero-encoding to track queries as they are already refined and more easily differentiable by the decoder. However, I agree this might not be the ideal solution. We tried learning fixed track query encodings or adding the query output from the previous frame as encoding but none of those gave better results.