yitu-opensource / T2T-ViT

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
Other
1.14k stars 177 forks source link

关于token_transformer中attention的部分 #68

Closed stillwaterman closed 2 years ago

stillwaterman commented 2 years ago

您好,我想问一下x = v.squeeze(1) + x这个skip connection的具体作用是什么,在您的论文中并没有提到。您代码的注释是# because the original x has different size with current x, use v to do skip connection,因为原始x和当前x尺寸不同所以用了skip connection似乎逻辑上说不通,因为skip connection不改变尺寸,所以我想询问一下用skip connection的目的是什么。同时我注意到qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.in_dim).permute(2, 0, 3, 1, 4)和 x = (attn @ v).transpose(1, 2).reshape(B, N, self.in_dim),好像只有当self.num_heads等于1的时候才能执行是吗? self.in_dim的含义是head_dim吗?