Closed mxjecho closed 1 year ago
Hi @mxjecho , thanks for your attention to our work!
RelativePosition2D is not a part in the original version of ViT. It is introduced to capture local information for better accuracy.
@wkcn Thanks for your reply. Is there any relevant paper for reference?
Our paper Rethinking and Improving Relative Position Encoding for Vision Transformer discusses the effects of 2D relative position encoding on vision transformer : )
Thanks!
Hi, I'm confused as to why the calss 'RelativePosition2D_super' was used in AutoFormer? There is no such operation in the vit code of timm.