nickgkan / butd_detr

Code for the ECCV22 paper "Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds"
Other
74 stars 11 forks source link

About Positional Embeddings used #51

Closed MnCSSJ4x closed 3 months ago

MnCSSJ4x commented 3 months ago

Hi. I have noticed in the paper that you mentioned that you used sinusoidal embedding (which is non-learnable) as per DETR. While in code I see that you use PositionEmbeddingLearned from modules.py which consists of 1D convolutions and is xyz dependent in forward pass (and henece learnable). I noticed that this class is used with different parameters for both box encoder and also the cross modal encoding. I wanted to know how both of these statements are related or if I am missing the sinusoidal encodings at any point. Alongside this I also notice the class definition of PositionEmbeddingLearned in two files with their codes being the same. Is there any reason as such to replicate it?

ayushjain1144 commented 3 months ago

Hi,

Screenshot 2024-04-26 at 6 30 14 PM

As mentioned above, in 3D we use learned positioned embeddings with XYZ as input same as GroupFree model. About duplication: no specific reason, that might just be an oversight.