Mixup & Cutmix during Pre-Training

microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers

MIT License

408 stars 46 forks source link

Mixup & Cutmix during Pre-Training #14

Closed cashincashout closed 3 years ago

cashincashout commented 3 years ago

Hi @ChunyuanLI, I've noticed the usage of mixup and cutmix during pre-training, which is not included in DINO. I'm wondering the performance gain brought by applying mixup & cutmix. Have you ever run any related experiments pre-trained w.o. mixup? I'm especially interested in vanilla DINO with Swin-T/Swin-B as the backbone, i.e., EsViT w. only view-level task, w.o. mixup & cutmix. It would be nice if you could inform me of those results.

ChunyuanLI commented 3 years ago

All the numbers reported in this paper do NOT employ mixup & cutmix.

In the early stage of this project, I did several ablation studies to explore DeiT augmentation (including mixup & cutmix) for self-supervised learning, but did not see performance improvement. The code with mixup & cutmix augmentation remains in the code release, so that people who are interested can explore more.

ChunyuanLI commented 3 years ago

For your request "vanilla DINO with Swin-T/Swin-B as the backbone, i.e., EsViT w. only view-level task, w.o. mixup & cutmix", please see this newly added table in README.md:

EsViT with view-level task only

arch	params	tasks	linear	k-nn	download	logs
ResNet-50	23M	V	75.0%	69.1%	full ckpt	train	linear	knn
EsViT (Swin-T, W=7)	28M	V	77.0%	74.2%	full ckpt	train	linear	knn
EsViT (Swin-S, W=7)	49M	V	79.2%	76.9%	full ckpt	train	linear	knn
EsViT (Swin-B, W=7)	87M	V	79.6%	77.7%	full ckpt	train	linear	knn