can you explain cross attention

zhihou7 / BatchFormer

CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522

242 stars 20 forks source link

can you explain cross attention #13

Open yandun72 opened 1 year ago

yandun72 commented 1 year ago

Hi，I know TransformerEncoderLayer(C,4,C,0.5) C 4 C is the d_model n_head and dim_feedforward meaning.

and x.unsqueeze(1) becomes N 1 C shape。

Because batch_first is false for transformer,so it will do self attention at batch dim, but i am confused with what you said in the paper of cross attention. I cant read the cross attention in the pseudo code，can you give me a interpretation about it.By the way what if x shape is batch seq hidden_size？Because for NER task，its shape is that。 In this situation how to apply batchformer？hope for your sincere reply！

zhihou7 commented 1 year ago

Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.

Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.

Regards,

yandun72 commented 1 year ago

Hi @yandun72. If the shape of x is (batch, seq, hidden_size), you can permute the shape as (seq, batch, hidden_size) or set batch_first=true.

Sorry for the description of cross-attention that is confusing you. In batchformer, we incorporate attention across the batch dimension. Therefore, the cross-attention is not specific attention, but transformer attention. We just want to emphasize the batch dimension. You can regard it as cross-batch attention.

Regards,

Thanks for your reply！I have got it！