Closed MSiam closed 4 years ago
Hi, for the input data, the number of positions is HW. And the weight for each local region is 32*72, with local footprint size as 72. The attention weights are 32 channels, shared by 256 feature channels. The number of channels sharing the same attention weight is set to 8, as described in the last part in Sec. 5.1. Thanks.
Hello I had a question regarding your code. In your patchwise attention model you are using these cython kernels for aggregation
I was a bit confused on what exactly is it doing can you explain its functionality? If I have for an example input data: 1x256x1xWH weights: 1x32x7**2xWH
How does it perform the hadamard product described in equation 4
Also can I generally confirm the following in SAM module and its mapping to equations 4 and 5 in your paper: 1- conv1: phi, conv2:psi, conv3:beta 2- delta is simple concatenation 3- conv_w: gamma
Thanks for your help.