in class OutlookAttention, there is self.v = nn.Linear(dim, dim, bias=qkv_bias) and the input of this class is x whose shape is B, H, W, C = x.shape. My quesion is how this code v = self.v(x).permute(0, 3, 1, 2) # B, C, H, W can go well without exception because matrix multiplication [B, H, W, C] * [dim, dim] will do here. And also in the original paper, Algorithm 1 implements v_pj = nn.Linear(C, C). But in your codes, C is replaced with dim. Thanks!
in
class OutlookAttention
, there isself.v = nn.Linear(dim, dim, bias=qkv_bias)
and the input of this class isx
whose shape isB, H, W, C = x.shape
. My quesion is how this codev = self.v(x).permute(0, 3, 1, 2) # B, C, H, W
can go well without exception because matrix multiplication[B, H, W, C] * [dim, dim]
will do here. And also in the original paper, Algorithm 1 implementsv_pj = nn.Linear(C, C)
. But in your codes,C
is replaced withdim
. Thanks!