A few question about Deit Model used in Dynamic Vit

raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

https://dynamicvit.ivg-research.xyz/

MIT License

576 stars 72 forks source link

A few question about Deit Model used in Dynamic Vit #33

Closed secretu closed 1 year ago

secretu commented 1 year ago

Hi, I noticed that the Deit Models used in your code are the version that pretrained with no distillation, getting acc1 at 79.8%. But why not use the version that pretrained with distillation and distill token ,which get acc1 at 81.2% .

raoyongming commented 1 year ago

Hi @secretu, thanks for your interest in our work. We use DeiT without distillation since the series of models is widely used in many other architecture papers (e.g., Swin, PVT), while the version with distillation might not be frequently used for comparisons. The results should be consistent if you switch the model to a more powerful version.