tengshaofeng / ResidualAttentionNetwork-pytorch

a pytorch code about Residual Attention Network. This code is based on two projects from
667 stars 165 forks source link

Mixed attention、Channel attention and Spatial attention #15

Open YANYANYEAH opened 5 years ago

YANYANYEAH commented 5 years ago

Hello, I studied your code carefully, and then I found that there are different formulas for Mixed Attention, Channel Attention and Spatial Attention in the paper. But I don't see a formal representation of F (xi, c) in your code. I just started to learn about Deep Networks. How do I modify the network if I want to express different attentions? Thank you!

YANYANYEAH commented 5 years ago

sorry, I read the article carefully and found the following paragraph. "Mixed attention f1 without additional restriction use simple sigmoid for each channel and spatial position. Channel attention f2 performs L2 normalization within all channels for each spatial position to remove spatial information. Spatial attention f3 performs normalization within feature map from each channel and then sigmoid to get soft mask related to spatial information only." but I don't know exactly how to do F2 and F3. Suppose the feature size is [batch_size, channel, height, width]. Does F2 use nn. Batch Norm2d (channel) to normalize each channel? Does F3 use nn. BatchNorm2d (hetght * width) to normalize each spatial location and then sigmoid?

tengshaofeng commented 5 years ago

you can treat Squeeze and Excitation Network as Channel attention f2, with global pooling for each channel, afterward with the MLP for output weight of each channel. While spatial attention means each pixel in every feature map has its weight.

YANYANYEAH commented 5 years ago

Thank you very much for your answer. I have been trying to solve it with normalization, and I will try it on each method.

fengshenfeilian commented 5 years ago

Hello, I feel confused about F1 attention, so does it mean using conv->relu->conv->sigmoid to operate on feature maps?