Open launchauto opened 2 years ago
I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.
optimizer adamw lr=0.016 batchsize=4096 weight_decay=0, cosine decay, mixup=0.0 cutmix=0.0, labelsmooth=0.0, warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation
Thanks for your sharing. According to your reproduction, the 1600epochs pretrained ViT-B has only 83.2 end-to-end finetune acc, which resulting in a 0.4 gap compared to the paper report. However, the 400epochs pretrained ViT-B has already achieved 83.1 end-to-end finetune acc. It seems that extra 1200 epochs longer pretrain brings negligible improvement, which is quite confusing. Do you have any ideas about it?
negligible
Sorry, no idea.
I got similar results. MAE+ViT-B+400ep: the linear probing top1-accuracy is 53.01.
optimizer adamw lr=0.016 batchsize=4096 weight_decay=0, cosine decay, mixup=0.0 cutmix=0.0, labelsmooth=0.0, warmup epochs=5, total training epochs=100 only use random resize, random flip and crop as data augmentation
Yeah, I use your linear-prob method and may get +0.33% when testing mae-large model. However, it is still much lower than expected.
I also tried to reproduce the linear probe results with no success. Interestingly, when I tried the non normalized loss during pretraining, the linear probe accuracy for the base config increased to 60% (still much lower than the expected 68%). With the normalized loss I also got 53.9% accuracy as you. Were you able to reproduce the linear probe results lately?
Hi, @launchauto, @michuanhaohao , @mts42000
Thanks for your efforts in reproducing the linear probe results.
I noticed that the official MAE repo has released the linear probe code. Thus, it is not hard to reproduce.
However, I was wondering did you find what caused the inconsistent performance compared with your original reproduction? I think there is not much difference between your configuration and the official configuration. However, the performance gap is very large.
Any help would be appreciated.
Dear author I have reproduced your code using 64 V100 GPUs. Every setting is the same as paper (batch size 4096), The end-to-end finetuning is almost the same as paper. However, the linear prob is lower than expected in the paper. All of the experiments use normalized targets.
By the way, I used MoCo V3 2d position-embedding to replace 1d sin-cos position-embedding, which may help(MAE Vit-Base+0.3% in e2e finetuning and linear probing). MoCo V3
I also test your 400 epochs open MAE-ViT-Base model, the linear probing top1-accuracy is 50.91.
Did I miss something mentioned details in the paper?
For the parameters used in linear probing, I followed the setting in the appendix of the paper. optimizer LARS lr=6.4 batchsize=16384 weight_decay=0, momentum=0.9, cosine decay warmup epochs=10, total training epochs=90, only use random resize and crop as data augmentation Replaced the last layer norm with the Batch norm(affine=False) before the classifier. During the linear probing, I have frozen the backbone, only updating the fc+norm+mean pooling in the head of the classifier.