hszhao / SAN

Exploring Self-attention for Image Recognition, CVPR2020.
MIT License
747 stars 132 forks source link

What calculations are performed by the Aggregation function? #4

Closed shimopino closed 4 years ago

shimopino commented 4 years ago

Thank you for your great work!

When I run the SAM module on PatchWise-Attention (sa_type=1) for verification, the input tensors of the Aggregation function has the following shape.

In the paper, it says that the outputs of the streams are aggregated via a Hadamard product.

Would you like to tell me what operations are performed on the tensors of these different shapes to perform a Hadamard product?

hszhao commented 4 years ago

Hi, for a local footprint (e.g., KxK) in each position, the tensor's shape is C/r_2xKxK, the learned attention weights w is C/r_2/share_planesxKxK. And 'share_planes' number of channels share the same attention weight. We do Hadamard product and spatial aggregation to generate the output for this position.