mobilevit ’s structure output donot consistent with the paper

wilbur-caper commented 2 years ago

thanks you for the great work; here is the paper's graph: ![Uploading image.png…]()

I print the layer input and output below: 0 fc x.shape torch.Size([1, 3, 224, 224]) 1 fc y.shape torch.Size([1, 16, 112, 112]) 2 fc y.shape torch.Size([1, 16, 112, 112]) 3 fc y.shape torch.Size([1, 24, 112, 112]) 4 fc y.shape torch.Size([1, 24, 112, 112]) 5 fc y.shape torch.Size([1, 24, 112, 112]) m_vits 1 b y.shape torch.Size([1, 48, 112, 112]) m_vits 1 b y.shape torch.Size([1, 48, 112, 112]) m_vits 2 b y.shape torch.Size([1, 64, 112, 112]) m_vits 2 b y.shape torch.Size([1, 64, 112, 112]) m_vits 3 b y.shape torch.Size([1, 80, 112, 112]) m_vits 3 b y.shape torch.Size([1, 80, 112, 112]) 2222 fc y.shape torch.Size([1, 320, 112, 112]) 3 fc y.shape torch.Size([1, 3595520])

wilbur-caper commented 2 years ago

if the input is 1X3X224X224,the layer output should be 112,56,28,14,7,1

nick70422 commented 9 months ago

I have encountered same issue, it's in the MV2Block in MobileViT.py line 134:

nn.Conv2d(inp,hidden_dim,kernel_size=1,stride=1,bias=False),

you should change "stride = 1" to "stride = self.stride" hope it'll help

xmu-xiaoma666 / External-Attention-pytorch

mobilevit ’s structure output donot consistent with the paper #49