What is difference between Deit and ViT models here? They looks like the same?

hila-chefer / Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

MIT License

1.75k stars 232 forks source link

What is difference between Deit and ViT models here? They looks like the same? #45

Closed szubing closed 2 years ago

szubing commented 2 years ago

I have not found any difference between vit_base_patch16_224 and deit_base_patch16_224 in ViT_LRP.py? What makes they differerent?

hila-chefer commented 2 years ago

Hi @szubing, thanks for your interest!

The two models use the same architecture but were trained differently. DeiT uses knowledge distillation while ViT uses plain supervision for training. I recommend reading the papers to understand the differences.