microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
https://arxiv.org/abs/2103.14030
MIT License
13.28k stars 2.02k forks source link

Finetune Swin-transformer #117

Open scott870430 opened 2 years ago

scott870430 commented 2 years ago

Thanks for the great work. I want to finetune swin-transformer with different resolution, such as 512 x 512. If I only modify IMG_SIZE in config from 224 to 512, I will get error RuntimeError: shape '[1, 18, 7, 18, 7, 1]' is invalid for input of size 16384, I think it is due to the window size?

And if I use resolution 448 to finetune model, I will get error when loading pre-trained wieght. size mismatch for layers.2.blocks.3.attn.relative_position_bias_table According to this issue, how can I use bicubic for initialize the relative_position_bias_table? Has it already been implemented in code? Or do I need to implement it myself?

In Swin-Transformer-Semantic-Segmentation, the model can accept arbitrary resolution for semantic segmentation. Can I copy the implementation of swin-transformer part in Swin-Transformer-Semantic-Segmentation, and use it for classification. So, the model can accept arbitrary for classification? Does it make sense? Or should I modify any part of model for classification?

Thank you in advance for your help.

FrankWuuu commented 2 years ago

Thanks for the great work. I want to finetune swin-transformer with different resolution, such as 512 x 512. If I only modify IMG_SIZE in config from 224 to 512, I will get error RuntimeError: shape '[1, 18, 7, 18, 7, 1]' is invalid for input of size 16384, I think it is due to the window size?

And if I use resolution 448 to finetune model, I will get error when loading pre-trained wieght. size mismatch for layers.2.blocks.3.attn.relative_position_bias_table According to this issue, how can I use bicubic for initialize the relative_position_bias_table? Has it already been implemented in code? Or do I need to implement it myself?

In Swin-Transformer-Semantic-Segmentation, the model can accept arbitrary resolution for semantic segmentation. Can I copy the implementation of swin-transformer part in Swin-Transformer-Semantic-Segmentation, and use it for classification. So, the model can accept arbitrary for classification? Does it make sense? Or should I modify any part of model for classification?

Thank you in advance for your help.

hello, so sorry to bother you. I met the same error: RuntimeError: shape '[1, 18, 7, 18, 7, 1]' is invalid for input of size 7744 Could you please let me know whether you deal with it and if yes, can you show how avoid it?

scott870430 commented 2 years ago

Hi @FrankWuuu , I think it is due to the windows size. You need to change windows size to make image size divide windows evenly (224/7 or 512/8). However, I am not sure how to modify pre-trained with different windows size. If you know how to do it, please let me know. Thanks.

FrankWuuu commented 2 years ago

Hi @FrankWuuu , I think it is due to the windows size. You need to change windows size to make image size divide windows evenly (224/7 or 512/8). However, I am not sure how to modify pre-trained with different windows size. If you know how to do it, please let me know. Thanks.

Thanks for your help. I'll try as you say to change the windows size. Now I just want to use the Swin-transformer as a backbone, so I might do not use the pretrained file. I am sorry that I may can't give you some advice.

FrankWuuu commented 2 years ago

Hi @scott870430 , I follow your advice to change the window_size=8, and the image_size=256. When I train it, I get another problem: RuntimeError: Expected 4-dimensional input for 4-dimensional weight [12, 192, 1, 1], but got 3-dimensional input of size [16, 1, 1] instead So can you give me some advice? is there some problem in my dataloader? Thanks

Lsz-20 commented 2 years ago

@scott870430 @FrankWuuu perhaps you have resolve this questions?I have the question when running swin in mmsegmentation. I have used my dataset & add some part in swin,and it works well in ‘Swin-Transformer-Semantic-Segmentation ’code ,but when i change to use swin in mmsegmentation,it always show me: size mismatch for stages.0.downsample.norm.weight: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([192]). size mismatch for stages.0.downsample.norm.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([192]). It seems that pretrain model cann't work well. Thanks,waiting for your answer~~