junfu1115 / DANet

Dual Attention Network for Scene Segmentation (CVPR2019)
MIT License
2.41k stars 483 forks source link

In DANet, questions of features channels #131

Open HYERI520 opened 3 years ago

HYERI520 commented 3 years ago

Hi, I'm quite new and studying. I have some questions while studying your paper and code.

In below code, I think each convolution using at B and C, D in PAM (da_att.py) self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # B self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # C self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1) # D

  1. I want to know why channel reduced by 8 in B and C, not in D. And why 8?
  2. In PAM why you using convolution kernel size as 1?

In "danet.py", after self.sa(), the output of PAM, sa_feat pass two convolutions, conv51 and conv6. (same in CAM) feat1 = self.conv5a(x) sa_feat = self.sa(feat1) sa_conv = self.conv51(sa_feat) sa_output = self.conv6(sa_conv)

  1. Why the output feature of PAM pass through two convolution? I though just one convolution before an element-wise summation through your paper below. we transform the outputs of two attention modules by "a convolution layer" and perform an element-wise sum to accomplish feature fusion."

Thank you.

wlj567 commented 3 years ago

Hello, have you passed the complete code? Is it convenient to add a contact information? Thank you!