Closed XiaoyuShi97 closed 3 years ago
Hi. Yes, the positional encoded feature map is fed to the transformer encoder as the query and key, while the original feature map is served as the values. We mainly followed the design of DETR for the backbone and transformer.
Hi. According to formula (4) in your paper, you add positional encoding P to get a context feature map c. But in your code, you just follow transformer to add positional encoding to key and query and keep value clean? Did I miss anything?