naver-ai / egtr

[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation
https://arxiv.org/abs/2404.02072
Apache License 2.0
79 stars 2 forks source link

About the self-attention of DETR decoder #9

Open jiugexuan opened 1 month ago

jiugexuan commented 1 month ago

In paper: We propose a novel lightweight relation extractor, EGTR, which exploits the self-attention of DETR decoder, as depicted in Fig. 3. Since the self-attention weights in Eq. (1) contain N × N bidirectional relationships among the N object queries, our relation extractor aims to extract the predicate information from the self-attention weights in the entire L layers, by considering the attention queries and keys as subjects and objects, respectively. Is the self-attention of DETR decoder the mask multi-attention layer in transformer decoder?

jinbae commented 1 month ago

Please refer to the DETR paper. The self-attention of DETR decoder (not masked) is different from that of the original Transformer decoder (masked).

The difference with the original transformer is that our model decodes the N objects in parallel at each decoder layer,
while Vaswani et al. [47] use an autoregressive model that predicts the output sequence one element at a time.
jiugexuan commented 1 month ago

so the q,k used for relation from the first attn_layer transformer decoder ? 7a0d83b0292df9825d819c1ab3f3f995 from here?

jinbae commented 1 month ago

Yes, that's right.