ICCV '21 | Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows

mental2008 / awesome-papers

Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and other interesting stuffs).

https://paper.lingyunyang.com/

MIT License

38 stars 2 forks source link

ICCV '21 | Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows #27

Closed mental2008 closed 2 years ago

mental2008 commented 2 years ago

Presented in ICCV '21. [ Paper | Supplement | arXiv | Code ] Awarded Best Paper (Marr Prize)!

Authors: Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo Microsoft Research Asia, University of Science and Technology of China, Xian Jiaotong University, Tsinghua University

mental2008 commented 2 years ago

As mentioned in the paper, Swin Transformer is a new vision transformer (ViT), which serves as a general-purpose backbone for computer vision.

"Interesting" points:

Excellent results in many tasks. (Is really good?)
The authors argue that the hierarchical design and the shifted window approach can be applied for all-MLP architectures efficiently.
The code is open-sourced in GitHub.

The model architecture is as follows:

Not read the details of the paper.