sail-sg / poolformer

PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
https://arxiv.org/abs/2111.11418
Apache License 2.0
1.29k stars 117 forks source link

why use use_layer_scale #35

Closed rtfgithub closed 2 years ago

rtfgithub commented 2 years ago

thanks for your great contribution! in the implement for poolformerblock ,there is a layer_scale after token_mixer. What is the impact of this operation?

yuweihao commented 2 years ago

Hi @rtfgithub ,

Like stochastic depth, LayerScale can help train the models. For more details, please refer to the following paper that proposes this operator.

Going deeper with Image Transformers

rtfgithub commented 2 years ago

thanks for your reply!I'd like to ask you another question. As mentioned in your paper, the Training strategy of poolformer follow DEIT method. Did you add the distillation token to model and use the distillation method with Hard-label distillation loss?

yuweihao commented 2 years ago

Hi @rtfgithub ,

We follow the training hyper-parameters of DeiT but we don't add distillation methods.

rtfgithub commented 2 years ago

thank u for your reply!Good luck!

yuweihao commented 2 years ago

You are welcome :)