Closed shimopino closed 4 years ago
Hi, for a local footprint (e.g., KxK) in each position, the tensor's shape is C/r_2xKxK, the learned attention weights w is C/r_2/share_planesxKxK. And 'share_planes' number of channels share the same attention weight. We do Hadamard product and spatial aggregation to generate the output for this position.
Thank you for your great work!
When I run the SAM module on PatchWise-Attention (sa_type=1) for verification, the input tensors of the Aggregation function has the following shape.
In the paper, it says that the outputs of the streams are aggregated via a Hadamard product.
Would you like to tell me what operations are performed on the tensors of these different shapes to perform a Hadamard product?