The problem is shown in the figure

microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022

MIT License

539 stars 78 forks source link

The problem is shown in the figure #5

Closed rui-cf closed 2 years ago

rui-cf commented 3 years ago

model = CSWinTransformer(patch_size=4, embed_dim=96, depth=[2,4,32,2], split_size=[1,2,12,12], num_heads=[4,8,16,32], mlp_ratio=4.).cuda().eval() inp = torch.rand(1, 3, 224, 224).cuda() outs = model(inp) for out in outs: print(out.shape)

RuntimeError: shape '[1, 192, 1, 14, 1, 12]' is invalid for input of size 37632

why?

LightDXY commented 3 years ago

The feature map size should be divisible by the split size. For example, with image_size=224, the feature map size of stage 3 is 14, and the default split size is 7. We use split size=12 for input image size=384.

We add padding to realize a flexible split size for downstream tasks, but we do not consider it in the ImageNet model because the padding is not efficient.

506813254 commented 3 years ago

The feature map size should be divisible by the split size. For example, with image_size=224, the feature map size of stage 3 is 14, and the default split size is 7. We use split size=12 for input image size=384.

We add padding to realize a flexible split size for downstream tasks, but we do not consider it in the ImageNet model because the padding is not efficient.

Hi，What does the padding do？

54wb commented 1 year ago

The feature map size should be divisible by the split size. For example, with image_size=224, the feature map size of stage 3 is 14, and the default split size is 7. We use split size=12 for input image size=384. We add padding to realize a flexible split size for downstream tasks, but we do not consider it in the ImageNet model because the padding is not efficient.

Hi，What does the padding do？

hi, do you know what does the padding do or how can i find the padding code

wujiang0156 commented 1 year ago

the table's list is as same as the code. which is right? thank you