Closed henbucuoshanghai closed 1 year ago
不要这一代码 不好吗?
Do you mean the relative positional encoding (RPE) in our pooling-based attention? RPE has negligible computational cost and can bring significant improvement both on image classification and semantic segmentation as shown in our ablation study.
You can refer to our paper for more details. Thanks.
l(pool) 的l是self.d_convs 为什么是四个组卷积呢? self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]])
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
It is because we have 4 different pooling ratios corresponding to 4 pooled maps in implementation.
pool = pool + l(pool)