Hello, Thank you for your excellent job about the attention. I am a little puzzled about the code. compared to the Senet, there is a batchnorm operation in the CoordAttention. Is it necessary for the attention mechanism? In addition, Is it necessary that I replace the ReLU operation (the self.relu(x + 3) / 6 ) with the ordinary ReLU, when the input are normalized between -1 and 1 .
Hello, Thank you for your excellent job about the attention. I am a little puzzled about the code. compared to the Senet, there is a batchnorm operation in the CoordAttention. Is it necessary for the attention mechanism? In addition, Is it necessary that I replace the ReLU operation (the self.relu(x + 3) / 6 ) with the ordinary ReLU, when the input are normalized between -1 and 1 .