Open cats-food opened 3 years ago
Also, I found another issue, in Attention.py
, transposed conv. is used to reconstruct the feature map, as shown in this line:
feature_map = F.conv_transpose2d(attention_scores, conv_kernels, stride = 1, padding = self.patch_size//2)
where conv_kernels
has been normalized in previous steps (conv_kernels = conv_kernels/norm_factor
).
However, I think the conv_kernels
here should be the original one (without normalization), could you please help me check this out, thanks! @jingyuanli001
First of all, thanks for your great work! I got a question when looking through your code in
Attention.py
, where line 39 is as follows:conv_result = F.avg_pool2d(conv_result, 3, 1, padding = 1)*9
this corresponds to the equation (5) in your paper:
My question is, the 3*3 average pooling itself is capable of getting the averaged results, why
multiplied by 9
is needed?Hope to hear your reply : )