microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.62k stars 220 forks source link

what is the purpose of class 'RelativePosition2D' in AutoFormer? #145

Closed mxjecho closed 1 year ago

mxjecho commented 1 year ago

Hi, I'm confused as to why the calss 'RelativePosition2D_super' was used in AutoFormer? There is no such operation in the vit code of timm.

wkcn commented 1 year ago

Hi @mxjecho , thanks for your attention to our work!

RelativePosition2D is not a part in the original version of ViT. It is introduced to capture local information for better accuracy.

mxjecho commented 1 year ago

@wkcn Thanks for your reply. Is there any relevant paper for reference?

wkcn commented 1 year ago

Our paper Rethinking and Improving Relative Position Encoding for Vision Transformer discusses the effects of 2D relative position encoding on vision transformer : )

mxjecho commented 1 year ago

Thanks!