Wrong implementation of GAU

Hi, Quoting the GAU explanation from the paper

Our Global Attention Upsample module performs global average pooling to provide global context as a guidance of low-level features to select category localization details. In detail, we perform 3×3 convolution on the low-level features to reduce channels of feature maps from CNNs. The global context generated from high-level features is through a 1×1 convolution with batch normalization and ReLU non-linearity, then multiplied by the low-level features. Finally, high-level features are added with the weighted low-level features and upsampled gradually

Your implementation has a slight variation in implementation.

xgmiao / Pyramid-Attention-Networks

Wrong implementation of GAU #1