lukemelas / PyTorch-Pretrained-ViT

Vision Transformer (ViT) in PyTorch
770 stars 124 forks source link

best performing model #2

Closed muaz1994 closed 3 years ago

muaz1994 commented 3 years ago

Hello and many thanks for your code. Can I know what is the best performing model? According to the paper on page 12, it seems that ViT-B/16 performs the best? So fewer layers work better?

lukemelas commented 3 years ago

In general, bigger models and smaller patches are better. As you said, for ImageNet pretrained, B16 seems to be best.