Why is nn.Linear instead of nn. Conv2d used to compute the spatial attention？

zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

BSD 3-Clause "New" or "Revised" License

849 stars 128 forks source link

Why is nn.Linear instead of nn. Conv2d used to compute the spatial attention？ #15

Closed jchhuang closed 5 years ago

jchhuang commented 5 years ago

Hi，when I read the code I find nn.Linear instead of nn. Conv2d is used to compute the spatial attention, I don't know why nn.Linear is adopted to compute the spatial attention? may you tell me the reason or the advantages of nn.Linear than nn.Conv2d. Thanks

zhmiao commented 5 years ago

Hello @jchhuang , we actually followed this paper (https://arxiv.org/pdf/1711.07971.pdf) to build the self attention module, please check it out. They have better explanations on this.

jchhuang commented 5 years ago

it is easy to understand the self-attention, but the implementation of spatial attention is confusing, may you help to explain the spatial attention?

zhmiao commented 5 years ago

@jchhuang OK， I get what you are saying. Actually the spatial attention is just the feature map of that conv layer, we did not do anything special, we just call the feature map spatial attention. In the code, it is much more straightforward. The reason we call it spatial attention is because the feature map basically is a kind of attention spatially in the input image.