lukemelas / PyTorch-Pretrained-ViT

Vision Transformer (ViT) in PyTorch
770 stars 124 forks source link

How to do "Fine-Tuning" or "Feature-Extraction" in the model B_16 (or even L_16)? #27

Open dgrnd4 opened 2 years ago

dgrnd4 commented 2 years ago

Hi there, do you know how I can I use one of the two techniques above to do image classification on "Stanford Dogs Dataset"? I've already tried the "B_16_imagenet1k" model but the accuracy obtained on 4.160 images isn't that good.

I saw that the difference between B_16 and L_16 is in the model parameters so even in the structure of the network. I didn't focus on it: can you explain it? Do you know where can I read about it?

edybk commented 2 years ago

following

longswordinhand commented 2 years ago

same question.