microsoft / CSWin-Transformer

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped, CVPR 2022
MIT License
539 stars 78 forks source link

The split size on ADE20K #10

Closed LTnanana closed 3 years ago

LTnanana commented 3 years ago

Hi, I wonder the split size when training on ADE20K with input size 512x512. Thanks!

LightDXY commented 3 years ago

Hi, we still follow the setting in ImageNet that sw=[1,2,7,7], enlarge it will lead to better performance, but the GPU memory cost and flops will also increase.

AnukritiSinghh commented 3 years ago

Hi @LTnanana, did sw=[1,2,7,7] work for you? or did you change the split size? Hope to get a reply!

Thanks

ydhongHIT commented 2 years ago

Hi, we still follow the setting in ImageNet that sw=[1,2,7,7], enlarge it will lead to better performance, but the GPU memory cost and flops will also increase.

Hi, I try to enlarge the split size but the performance drops a little. Do you have some results to demonstrate it? Or provide some trained models with larger split size. Besides, I find that the last stage always use the global self-attention no matter what the split size is. Looking forwards to your reply!

NikAleksFed commented 2 years ago

Hi, we still follow the setting in ImageNet that sw=[1,2,7,7], enlarge it will lead to better performance, but the GPU memory cost and flops will also increase.

My question: is it possible to use split size = 7 for the 512x512? I tested such scenario and got an error Exception: shape '[1, 192, 1, 32, 4, 7]' is invalid for input of size 196608 due to the fact, that 32 is not divisible on 7. When replaced 7 by 8, everything worked.