raoyongming / DenseCLIP

[CVPR 2022] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
505 stars 38 forks source link

shape related question #55

Open SKevin673 opened 3 months ago

SKevin673 commented 3 months ago

According to code in detection\denseclip\models.py#525 to #532

x = x.permute(1, 0, 2)  # NLD -> LND

features = []
for i, blk in enumerate(self.transformer.resblocks):
    x = blk(x)
    if i in self.out_indices:
        xp = x[:, 1:, :].permute(0, 2, 1).reshape(B, -1, H, W)
        features.append(xp.contiguous())

and code in detection\denseclip\models.py#322

self.attn = nn.MultiheadAttention(d_model, n_head)

batch_first is not initialized(so it should be False as default) when defining self.attn in class ResidualAttentionBlock. Question: Why the shape of x is (B, H*W+1, C) so that the shape of xp can be (B, C, H, W)?