What calculations are performed by the Aggregation function?

hszhao / SAN

Exploring Self-attention for Image Recognition, CVPR2020.

MIT License

747 stars 132 forks source link

Thank you for your great work!

When I run the SAM module on PatchWise-Attention (sa_type=1) for verification, the input tensors of the Aggregation function has the following shape.

$x_{3}: {\left[B, \dfrac{C}{r_{2}}, H, W \right]}$ $w:{\left[B, \dfrac{C}{r_{2}\times \text{shaped\_kernel}}, kernel\_size^{2}, H\times W \right]}$

In the paper, it says that the outputs of the streams are aggregated via a Hadamard product.

Would you like to tell me what operations are performed on the tensors of these different shapes to perform a Hadamard product?

hszhao / SAN