lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Apache License 2.0
2.13k stars 230 forks source link

Deformable Attention #373

Open janmarczak opened 1 month ago

janmarczak commented 1 month ago

Hi! Thank you for your great work. I was looking at the code and I see that deformable attention is only used in the cross-attention Decoder module.

Why is deformable attention not used anywhere else (for example in the encoder)?

Also what is the difference between your AIFI module and the original DETR Self Attention in the Encoder? Correct me if I am wrong but In the original DETR we also used only the last feature map as the input to the Encoder.

Thanks!

lyuwenyu commented 1 month ago
  1. In the encoder, we only operated on the highest-level features. In order to ensure accuracy, we used self attn (Because the input length is relatively small and the speed impact is also relatively small)
  2. Compared to the original version, we have introduced multi-scale features.
janmarczak commented 1 month ago

Thank you for your reply!

I have one more questions. How is Uncertainty from the paper calculated and how do we get the P(X) and C(X)? I couldn't find any information on it so any details or a brief explanation would be very helpful to me! Thank you

image