Clarification on Aggregation

hszhao / SAN

Exploring Self-attention for Image Recognition, CVPR2020.

MIT License

747 stars 133 forks source link

Hello I had a question regarding your code. In your patchwise attention model you are using these cython kernels for aggregation

I was a bit confused on what exactly is it doing can you explain its functionality? If I have for an example input data: 1x256x1xWH weights: 1x32x7**2xWH

How does it perform the hadamard product described in equation 4

Also can I generally confirm the following in SAM module and its mapping to equations 4 and 5 in your paper: 1- conv1: phi, conv2:psi, conv3:beta 2- delta is simple concatenation 3- conv_w: gamma

Thanks for your help.

hszhao / SAN

Clarification on Aggregation #7