sail-sg / poolformer

PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
https://arxiv.org/abs/2111.11418
Apache License 2.0
1.3k stars 117 forks source link

Why not conduct the experiment to directly compare the pooling and DW convolution #23

Closed ydhongHIT closed 2 years ago

ydhongHIT commented 2 years ago

Thanks for your great work. I want to ask if there are some experiments to compare 3x3 avgpooling and 3x3 dw convolution, for example, directly replacing pooling with 3x3 dw convolution in the same architecture.

yuweihao commented 2 years ago

Hi @ydhongHIT , thanks for your attention. The target of our paper is to demonstrate that the competence of transformer models primarily stems from the general architecture MetaFormer. To achieve this target, we finally select the most simple token mixer pooling to demonstrate MetaFormer. The token mixer comparison of pooling and dw conv seems not very conform to the target of the paper. We plan to release more MetaFormer models with different token mixers (eg DW Conv) around March. More experiment results may be added to this paper of future revised version or be reported in a new tech report.