Questions about the size of feature maps

Hello, author, thanks for your remarkable work. I noticed that you changed the stride(from 2 to 1) of the second conv block of stem block to get a H / 2 × W / 2 feature map. And after the first "Joint CNN & Transformer Layer", the feature map downsample twice again to H / 4 × W / 4 . But according to the paper of MPViT, it seems the first "Joint CNN & Transformer Layer" won't change the height and width of feature map. Did you make any additional changes? 1700108370092 1700108461215

zxcqlf / MonoViT

Questions about the size of feature maps #23