sail-sg / volo

VOLO: Vision Outlooker for Visual Recognition
Apache License 2.0
929 stars 94 forks source link

Confusion in class OutlookAttention moduel #12

Closed axhiao closed 3 years ago

axhiao commented 3 years ago

in class OutlookAttention, there is self.v = nn.Linear(dim, dim, bias=qkv_bias) and the input of this class is x whose shape is B, H, W, C = x.shape. My quesion is how this code v = self.v(x).permute(0, 3, 1, 2) # B, C, H, W can go well without exception because matrix multiplication [B, H, W, C] * [dim, dim] will do here. And also in the original paper, Algorithm 1 implements v_pj = nn.Linear(C, C). But in your codes, C is replaced with dim. Thanks!

yuanli2333 commented 3 years ago

dim in Transformer and Outlooker is the channel C, so dim=C.