Hi, I'm quite new and studying.
I have some questions while studying your paper and code.
In below code, I think each convolution using at B and C, D in PAM (da_att.py)
self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # B
self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # C
self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1) # D
I want to know why channel reduced by 8 in B and C, not in D. And why 8?
In PAM why you using convolution kernel size as 1?
In "danet.py", after self.sa(), the output of PAM, sa_feat pass two convolutions, conv51 and conv6. (same in CAM)
feat1 = self.conv5a(x)
sa_feat = self.sa(feat1)
sa_conv = self.conv51(sa_feat)
sa_output = self.conv6(sa_conv)
Why the output feature of PAM pass through two convolution? I though just one convolution before an element-wise summation through your paper below.
we transform the outputs of two attention modules by "a convolution layer" and perform an element-wise sum to accomplish feature fusion."
Hi, I'm quite new and studying. I have some questions while studying your paper and code.
In below code, I think each convolution using at B and C, D in PAM (da_att.py) self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # B self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) # C self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1) # D
In "danet.py", after self.sa(), the output of PAM, sa_feat pass two convolutions, conv51 and conv6. (same in CAM) feat1 = self.conv5a(x) sa_feat = self.sa(feat1) sa_conv = self.conv51(sa_feat) sa_output = self.conv6(sa_conv)
Thank you.