Our Global Attention Upsample module performs global average pooling to provide
global context as a guidance of low-level features to select category localization details. In
detail, we perform 3×3 convolution on the low-level features to reduce channels of feature
maps from CNNs. The global context generated from high-level features is through a 1×1
convolution with batch normalization and ReLU non-linearity, then multiplied by the low-level features. Finally, high-level features are added with the weighted low-level features and
upsampled gradually
Your implementation has a slight variation in implementation.
Hi, Quoting the GAU explanation from the paper
Your implementation has a slight variation in implementation.