yuhuan-wu / P2T

[TPAMI22] Pyramid Pooling Transformer for Scene Understanding
200 stars 18 forks source link

为什么多了一个 pool = pool + l(pool) #12

Closed henbucuoshanghai closed 1 year ago

henbucuoshanghai commented 1 year ago

pool = pool + l(pool)

henbucuoshanghai commented 1 year ago

不要这一代码 不好吗?

yuhuan-wu commented 1 year ago

Do you mean the relative positional encoding (RPE) in our pooling-based attention? RPE has negligible computational cost and can bring significant improvement both on image classification and semantic segmentation as shown in our ablation study.

You can refer to our paper for more details. Thanks.

henbucuoshanghai commented 1 year ago

 l(pool) 的l是self.d_convs 为什么是四个组卷积呢? self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]])

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

yuhuan-wu commented 1 year ago

It is because we have 4 different pooling ratios corresponding to 4 pooled maps in implementation.