Closed ydhongHIT closed 2 years ago
Hi @ydhongHIT , thanks for your attention. The target of our paper is to demonstrate that the competence of transformer models primarily stems from the general architecture MetaFormer. To achieve this target, we finally select the most simple token mixer pooling to demonstrate MetaFormer. The token mixer comparison of pooling and dw conv seems not very conform to the target of the paper. We plan to release more MetaFormer models with different token mixers (eg DW Conv) around March. More experiment results may be added to this paper of future revised version or be reported in a new tech report.
Thanks for your great work. I want to ask if there are some experiments to compare 3x3 avgpooling and 3x3 dw convolution, for example, directly replacing pooling with 3x3 dw convolution in the same architecture.