timmeinhardt / trackformer

Implementation of "TrackFormer: Multi-Object Tracking with Transformers”. [Conference on Computer Vision and Pattern Recognition (CVPR), 2022]
https://arxiv.org/abs/2101.02702
Apache License 2.0
502 stars 116 forks source link

Use of args.multi_frame_attention #67

Open tragians opened 1 year ago

tragians commented 1 year ago

Hi @timmeinhardt , thanks so much for this great work!

While trying to reproduce the results for MOTS20, I noticed some differences between your DeformableDETR and the DETR implementations.

Could you explain the use of args.multi_frame_attention in the adjusted DeformableDETR? I'm wondering why it is not used in the DETR based model for mask tracking.

Is multi frame attention not necessary to utilise track queries in the model? I read section 4.2 in the paper, but I'm still a bit confused.

timmeinhardt commented 1 year ago

We provide the MOTS20 results for the old model cause the deformable attention seemed to perform worse for segmentation. Multi-frame and multi-scale trainings were not part of the old model. However, there is no reason why multi-frame could not work for segmentation.

tragians commented 1 year ago

Thank you very much for your detailed answer!

I have a follow up question on the slightly modified Transformer Class you introduce. I was wondering what the use of the parameter track_attention is and whether it was used during training.

timmeinhardt commented 1 year ago

The track_attention is a legacy parameter and was not used during any of the trainings.