leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
454 stars 83 forks source link

How to calculate the relative positional embeddings from a row offset and column offset? #8

Open songkq opened 5 years ago

songkq commented 5 years ago

Thanks for sharing the great idea. When I read the paper, I have some issues about how to compute the relative positional embeddings, i.e. r_(a-i, b-j) , from a row offset a-i and a column offset b-j. Is there any explicit formula for the calculating process? Looking forward to your reply.

leaderj1001 commented 5 years ago

Thanks for your comments ! The paper is implemented referring to the following paper.

Attention Augmented Convolutional Networks Link

Thank you.

Jimmy880 commented 4 years ago

@leaderj1001 Hello, I wonder whether you implement the relative position embedding in the selfattention. I notice that you just exploit two random tensors to represent position in h and w direction while the paper said "The row and column offsets are associated with an embedding r{a−i} and r{b−j}". I am confused about whether the so called "embedding" should be implemented as the nn.Embedding operation in pytorch.

siyuan2018 commented 4 years ago

@leaderj1001 Hello, I wonder whether you implement the relative position embedding in the selfattention. I notice that you just exploit two random tensors to represent position in h and w direction while the paper said "The row and column offsets are associated with an embedding r{a−i} and r{b−j}". I am confused about whether the so called "embedding" should be implemented as the nn.Embedding operation in pytorch.

Hi, I am having the same question here. Did you figure out how to compute the relative position embedding?Thanks