microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.79k stars 2.05k forks source link

How we can set window image size on 112x112 image #96

Open khawar-islam opened 3 years ago

khawar-islam commented 3 years ago

Hi

Thank you for your great work. My Image size is 112x112 and the head is 12 and my window size is 7. It does not work for me.

Traceback

  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 111, in forward
    q, k, v = map(
  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 112, in <lambda>
    lambda t: rearrange(t, 'b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d',
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 424, in rearrange
    return reduce(tensor, pattern, reduction='rearrange', **axes_lengths)
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 376, in reduce
    raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d".
 Input tensor shape: torch.Size([1, 3, 3, 384]). Additional info: {'h': 24, 'w_h': 7, 'w_w': 7}.
 Shape mismatch, can't divide axis of length 3 in chunks of 7

Regards, Khawar

Nial4 commented 3 years ago

I noticed that many people use 224224 pictures as datasets, 224 can be divisible by 7. I also encountered this problem when using 512512 pictures. In order to be divisible, I set all the default parameters of model/swin_transformer (window_size=7) modified to 8 :)

khawar-islam commented 3 years ago

@Nial4 I am also facing the same for size 112 but still not working. Any advice?

Nial4 commented 3 years ago

Because from stage1 to stage3, the image size needs to be divisible by 4, and the windows size is required to be divisible by 7, so I suggest maybe you can try to resize the image to 128 * 128, and then change SwinTransforme init and WindowAttention init let window_size=8

ancientmooner commented 3 years ago

Hi

Thank you for your great work. My Image size is 112x112 and the head is 12 and my window size is 7. It does not work for me.

Traceback

  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 111, in forward
    q, k, v = map(
  File "/raid/khawar/PycharmProjects/thesis/vit_pytorch/SwinT/swin.py", line 112, in <lambda>
    lambda t: rearrange(t, 'b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d',
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 424, in rearrange
    return reduce(tensor, pattern, reduction='rearrange', **axes_lengths)
  File "/raid/khawar/anaconda3/envs/vision-transformer-pytorch/lib/python3.8/site-packages/einops/einops.py", line 376, in reduce
    raise EinopsError(message + '\n {}'.format(e))
einops.EinopsError:  Error while processing rearrange-reduction pattern "b (nw_h w_h) (nw_w w_w) (h d) -> b h (nw_h nw_w) (w_h w_w) d".
 Input tensor shape: torch.Size([1, 3, 3, 384]). Additional info: {'h': 24, 'w_h': 7, 'w_w': 7}.
 Shape mismatch, can't divide axis of length 3 in chunks of 7

Regards, Khawar

You may change the window size of the last stage as 3x3 or 4x4 (the feature map size). Another solution is to use padding.

scott870430 commented 3 years ago

Hi @Nial4, Following your suggestion, I modify window_size to 8 when training model with 512, 512 image. I will encounter size mismatch with relative_position_index and head when loading pre-trained weight, will these change drop perfromance of Swin transformer? And how can I deal with these, I set strict=False, but It still has error when loading pre-trained weight.

Is there any documentation on fine-tuning the Swin transformer? I have no idea for it. Thanks for your reply in advance!

Nial4 commented 3 years ago

Hi @scott870430 The parameters in the pre-training model are fixed, you cannot modify it, you can only do some pre-processing on your data set, such as resize. Or modify the window size to retrain. I have recently used a lot of swin models in some models, and I don’t think this will significantly affect its performance.

scott870430 commented 3 years ago

HI @Nial4 , Thanks for your reply. Does your retraining include pre-trained weights? If I modify the window size to retrain, can I use the pre-trained weight and just need to remove the mismatch parameters?

Thanks!

FrankWuuu commented 3 years ago

Hi @Nial4 , I meet a data problem. Could you give me some advice. I use the swin_transformer as my backbone for segmentation. My train_size is 256, and window_size was set to 8, but when I train it, I get the error: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [12, 192, 1, 1], but got 3-dimensional input of size [16, 1, 1] instead I have tried many times such as add a images=torch.unsqueeze(images, dim=0) but I failed. Thanks for your time. best

zhuole1025 commented 3 years ago

Hi @Nial4 , Thanks for your suggestions! But I am not sure that is it OK to directly use the pre-train weight finetuning my model on different tasks with modified window-size? Thank!

daixiangzi commented 3 years ago

Hi @Nial4 , I meet a data problem. Could you give me some advice. I use the swin_transformer as my backbone for segmentation. My train_size is 256, and window_size was set to 8, but when I train it, I get the error: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [12, 192, 1, 1], but got 3-dimensional input of size [16, 1, 1] instead I have tried many times such as add a images=torch.unsqueeze(images, dim=0) but I failed. Thanks for your time. best

+1

ancientmooner commented 2 years ago

Please go to Swin V2 for an approach to deal with varying window resolutions.

ancientmooner commented 2 years ago

@scott870430 You can try bicubic interpolation to leverage the pretrained model weights with different window size