microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers
MIT License
407 stars 45 forks source link

have you ever try to train swin large? #12

Open Dongshengjiang opened 2 years ago

Dongshengjiang commented 2 years ago

I found swin_large_patch4_window7_224.yaml config file in your code. Here is an interesting question that how about the performance for larger mode?

shallowtoil commented 2 years ago

Hi @ChunyuanLI @jwyang, I'm also wondering about the linear probe accuracy of DINO/EsViT with Swin-L. Have you ever run any related experiments?

ChunyuanLI commented 2 years ago

I ran the experiments on EsViT (Swin-L) once, but did not get better results than the best number 81.3% reported in our paper.

Dongshengjiang commented 2 years ago

can you provide the final results of swin-L for 300 epoch, I got knn 74.6 at epoch 180 of 300 which is lower than msg_small at same epoch. I suspected there could be two reason, one is the 1k data is not enough for the big model to coverage; second, maybe swin_large need bigger drop path?