leaderj1001 / Stand-Alone-Self-Attention

Implementing Stand-Alone Self-Attention in Vision Models using Pytorch
MIT License
456 stars 83 forks source link

Problems about groups #19

Open liqi0126 opened 4 years ago

liqi0126 commented 4 years ago

Should not the key_conv, query_conv and value_conv be defined with group parameters? I am not clear what group parameters do in the following lines.

https://github.com/leaderj1001/Stand-Alone-Self-Attention/blob/a983f0f643632b1f2b7b8b27693182f22e9e574c/attention.py#L43-L50

mrngbaeb commented 4 years ago

The 'groups' are not like the groups for group convolutions. In this case, each of the multi-attention heads takes in the full input dimension (not 1/n_groups of it). Computationally, this is done as one convolution and then reshaped to separate the heads.

liqi0126 commented 4 years ago

However, it seems like not to separate k_out, v_out and q_out into groups gives the same out results in the above codes. So what does the group parameter exactly do?

zenghui420 commented 3 years ago

Seems like q_out and k_out should be 'matrix-multiplied' at dim=2 (the one with the size self.out_channels // self.groups).

xyecoding commented 2 years ago

According to the listed code lines, it seems that self.groups has no effect on the result. I beleive no matter how to set self.groups, the code can only realize the case "group=1".