Why not just using final yayer self-attention weights for relationship extraction in EGTR

naver-ai / egtr

[CVPR 2024 Best paper award candidate] EGTR: Extracting Graph from Transformer for Scene Graph Generation

Apache License 2.0

78 stars 2 forks source link

We conducted a preliminary investigation to determine the optimal $k$ for the last $k$ self-attention layers. (Not included in the paper) While the differences in results were not substantial, using all layers proved to be the best choice.

We employ a gating mechanism for each layer, allowing us to indirectly assess the importance of each layer through the gate values. The experimental results for this can be found in Supplementary Figure 2. Interestingly, the gate value for the first self-attention layer (prior to any cross-attention layer being applied) was notably high.

naver-ai / egtr

Why not just using final yayer self-attention weights for relationship extraction in EGTR #13