About the batchnorm in CoordAttention

houqb / CoordAttention

Code for our CVPR2021 paper coordinate attention

MIT License

1.02k stars 122 forks source link

About the batchnorm in CoordAttention #30

Open hello-trouble opened 3 years ago

hello-trouble commented 3 years ago

Hello, Thank you for your excellent job about the attention. I am a little puzzled about the code. compared to the Senet, there is a batchnorm operation in the CoordAttention. Is it necessary for the attention mechanism? In addition, Is it necessary that I replace the ReLU operation (the self.relu(x + 3) / 6 ) with the ordinary ReLU, when the input are normalized between -1 and 1 .

houqb commented 3 years ago

In mobile network training, it would be better to use ReLU6 or Swich, which is smooth. MobileNetV3 has demonstrated this.