zhihou7 / BatchFormer

CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522
242 stars 20 forks source link

Question on squeeze(1) #11

Closed pone7 closed 1 year ago

pone7 commented 1 year ago

The shape of the input feature for a transformer is generally (batch, tokens, dim). As stated in the paper, BatchFormer performs attention at the batch level, but the input shape of this attention layer is (batch, 1, dims) via squeeze operation. I am considering whether its shape should be like (1, batch, dim)? Maybe I misunderstood something. Looking forward to your reply!

zhihou7 commented 1 year ago

Hi, thanks for your interest. It is because the batch dimension in torch.nn. TransformerEncoderLayer is the second dim.

that is (L,B, C)

feel free to comment if you have further questions.

regards,