raoyongming / GFNet

[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification
https://gfnet.ivg-research.xyz/
MIT License
448 stars 42 forks source link

Question about block design #11

Closed hsi1032 closed 2 years ago

hsi1032 commented 2 years ago

Hello, thanks for your great work!

In your figure and code, there is no skip connection after the global filter layer.

This is different from original transformer implementation, which has 2 skip connections in a single block (each for self-attention layer and FFN layer)

For example, original transformer uses blocks like

x = x + SA(x)
x = x + FFN(x)

But, global filter network uses below block

x_ = Global_Filter(x)
x = x + FFN(x_)

Is there any reason for adopting the current block architecture?

Thanks,

raoyongming commented 2 years ago

Hi, thanks for your interest in our work.

This modification will lead to around 0.1% accuracy improvement on ImageNet for the GFNet-H-Ti model. We also found using a single residual connection in each block will stabilize the training of deeper models since the total number of residual blocks is halved. A similar design is also used in recent work like ConvNeXt [1].

[1] A ConvNet for the 2020s, https://arxiv.org/abs/2201.03545

hsi1032 commented 2 years ago

Thank you for your fast reply!