lukemelas / PyTorch-Pretrained-ViT

Vision Transformer (ViT) in PyTorch
770 stars 124 forks source link

Cannot load custom config #5

Open arkel23 opened 3 years ago

arkel23 commented 3 years ago

Hey! First of all, thanks for your contribution! I have looked at multiple ViT implementations and yours seems like the most straightforward, well-organized and simple to use.

I'd like to use your from_config method to initiate the model, but I get this error. I was looking everywhere and couldn't find any from_config method so that may be the problem?

from pytorch_pretrained_vit import ViT
# The following is equivalent to ViT('B_16')
config = dict(hidden_size=512, num_heads=8, num_layers=6)
model = ViT.from_config(config)

AttributeError: type object 'ViT' has no attribute 'from_config'

Also, I'm guessing that if you change anything in the config, the model would have to be retrained from scratch, since the pretrained weights wouldn't fit the model anymore, is that right?

And another thing is that you mention that those are equivalent to ViT('B_16') but in B_16 shouldnt the num_heads=12, and num_layers=12? And what is hidden_size=512 for? I cannot find any part in the code that refers to it.

Thanks in advance.

arp95 commented 3 years ago

Is there an update on this? I cannot load the custom config ViT for my case.

khawar-islam commented 3 years ago

@arkel23 Did you train on some different datasets using VIT pre-trained model?

arkel23 commented 3 years ago

@arkel23 Did you train on some different datasets using VIT pre-trained model?

Yes, I did. I trained some models for anime character face recognition on https://github.com/arkel23/animesion/tree/main/classification using this repository as baseline for the ViT model and just added the training pipeline and the datasets.