2020-xx-xx(maybe 2019) |
iGPT |
ICML 2020 |
Generative Pretraining from Pixels |
iGPT |
2020-10-22 |
ViT |
ICLR 2021 (Oral) |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale |
ViT |
2021-04-08 |
SiT |
Arxiv 2021 |
SiT: Self-supervised vIsion Transformer |
None |
2021-06-10 |
MST |
NeurIPS 2021 |
MST: Masked Self-Supervised Transformer for Visual Representation |
None |
2021-06-14 |
BEiT |
ICLR 2022 (Oral) |
BEiT: BERT Pre-Training of Image Transformers |
BEiT |
2021-11-11 |
MAE |
Arxiv 2021 |
Masked Autoencoders Are Scalable Vision Learners |
MAE |
2021-11-15 |
iBoT |
ICLR 2022 |
iBOT: Image BERT Pre-Training with Online Tokenizer |
iBoT |
2021-11-18 |
SimMIM |
Arxiv 2021 |
SimMIM: A Simple Framework for Masked Image Modeling |
SimMIM |
2021-11-24 |
PeCo |
Arxiv 2021 |
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers |
None |
2021-11-30 |
MC-SSL0.0 |
Arxiv 2021 |
MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning |
None |
2021-12-16 |
MaskFeat |
Arxiv 2021 |
Masked Feature Prediction for Self-Supervised Visual Pre-Training |
None |
2021-12-20 |
SplitMask |
Arxiv 2021 |
Are Large-scale Datasets Necessary for Self-Supervised Pre-training? |
None |
2022-01-31 |
ADIOS |
Arxiv 2022 |
Adversarial Masking for Self-Supervised Learning |
None |
2022-02-07 |
CAE |
Arxiv 2022 |
Context Autoencoder for Self-Supervised Representation Learning |
CAE |
2022-02-07 |
CIM |
Arxiv 2022 |
Corrupted Image Modeling for Self-Supervised Visual Pre-Training |
None |
2022-03-10 |
MVP |
Arxiv 2022 |
MVP: Multimodality-guided Visual Pre-training |
None |
2022-03-23 |
AttMask |
ECCV 2022 |
What to Hide from Your Students: Attention-Guided Masked Image Modeling |
AttMask |
2022-03-29 |
mc-BEiT |
Arxiv 2022 |
mc-BEiT: Multi-choice Discretization for Image BERT Pre-training |
None |
2022-04-18 |
Ge2-AE |
Arxiv 2022 |
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training |
None |
2022-05-08 |
MCMAE |
NeurIPS 2022 |
MCMAE: Masked Convolution Meets Masked Autoencoders |
MCMAE |
2022-05-20 |
UM-MAE |
Arxiv 2022 |
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality |
UM-MAE |
2022-05-26 |
GreenMIM |
Arxiv 2022 |
Green Hierarchical Vision Transformer for Masked Image Modeling |
GreenMIM |
2022-05-26 |
MixMIM |
Arxiv 2022 |
MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning |
Code is Opening |
2022-05-28 |
SupMAE |
Arxiv 2022 |
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners |
SupMAE |
2022-05-30 |
HiViT |
Arxiv 2022 |
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling |
None |
2022-06-01 |
LoMaR |
Arxiv 2022 |
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction |
LoMaR |
2022-06-22 |
SemMAE |
NeurIPS 2022 |
SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders |
SemMAE |
2022-08-11 |
MILAN |
Arxiv 2022 |
MILAN: Masked Image Pretraining on Language Assisted Representation |
MILAN |
2022-11-14 |
EVA |
Arxiv 2022 |
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale |
EVA |
2022-11-28 |
AMT |
Arxiv 2022 |
Good helper is around you: Attention-driven Masked Image Modeling |
AMT |
2023-01-03 |
TinyMIM |
CVPR 2023 |
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models |
TinyMIM |
2023-03-04 |
PixMIM |
Arxiv 2023 |
PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling |
PixMIM |
2023-03-09 |
LocalMIM |
CVPR 2023 |
Masked Image Modeling with Local Multi-Scale Reconstruction |
LocalMIM |
2023-03-12 |
AutoMAE |
Arxiv 2023 |
Improving Masked Autoencoders by Learning Where to Mask |
AutoMAE |
2023-03-15 |
DeepMIM |
Arxiv 2023 |
DeepMIM: Deep Supervision for Masked Image Modeling |
DeepMIM |
2023-04-25 |
Img2Vec |
Arxiv 2023 |
Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders |
None |
2023-12-30 |
DTM |
Arxiv 2023 |
Masked Image Modeling via Dynamic Token Morphing |
None |