A question about relative position embeddings

leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch

MIT License

456 stars 83 forks source link

A question about relative position embeddings #25

Open 787629504 opened 3 years ago

787629504 commented 3 years ago

Hello, I want to ask you a question. In relative position embeddings , why should the number of channels be divided by 2（out_channels // 2）?

self.rel_h = nn.Parameter(torch.randn(out_channels // 2, 1, 1, kernel_size, 1), requires_grad=True) self.rel_w = nn.Parameter(torch.randn(out_channels // 2, 1, 1, 1, kernel_size), requires_grad=True)

Jingtianci commented 3 years ago

The channels are divided into two parts, the first part contains relative height information(rel_h), and the other part contains relative width information(rel_w).