Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks

microsoft / Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

https://arxiv.org/abs/2103.14030

MIT License

13.96k stars 2.06k forks source link

Clarification on Speed Improvement with `fused_window_process` and Its Necessity for Small-Scale Tasks #371

Open Fanqyu opened 1 month ago

Fanqyu commented 1 month ago

Hi, thank you for your excellent work!

I have a question regarding the fused_window_process. With the integration of the window process in the CUDA files, is the speed improvement significant? Could you provide some quantitative data to illustrate the performance gains?

Additionally, for tasks of a smaller scale, is it necessary to utilize the window process, or would it be better to use a default implementation of torch.roll?

Looking forward to your response!