open-mmlab / awesome-vit

403 stars 43 forks source link
awesome-vit

English | 简体中文

A curated list and survey of awesome Vision Transformers.

You can use mind mapping software to open the mind mapping source file. You can also download the mind mapping HD pictures if you just want to browse them.

Contents

Survey

Only typical algorithms are listed in each category.

Image Classification

Chinese Blogs

Attention-based

image

Training Strategy

image

Model Improvements
Tokenization Module

image

Image to Token:

Token to Token:

Position Encoding Module

image

Explicit position encoding:

Implicit position encoding:

Attention Module

image

Include only global attention:

Introduce extra local attention:

FFN Module

image

Improve performance with Conv's local information extraction capability:

Normalization Module Location

image

Classification Prediction Head Module

image

Others

image

(1) How to output multi-scale feature map

(2) How to train a deeper Transformer

MLP-based

image

ConvMixer-based

General Architecture Analysis

image

Others

Object Detection

Semantic Segmentation

⬆ back to top

Papers

Transformer Original Paper

ViT Original Paper

Image Classification

2020

2021

2022

Object Detection

Semantic Segmentation

⬆ back to top

Stay tuned and PRs are welcomed!