Open 12sf12 opened 2 years ago
Hi
Thanks for your outstanding work.
I faced an issue when I wanted to load one of the pretrained vit base with this URL: 'https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth'
in the state-dict, the model does not have 'visual_encoder.pos_embed'. Hence, it produces an error. For instance, the following code is not executable:
model_url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth' model = blip_decoder(pretrained=model_url, image_size=224, vit='base')
Would it be possible to share with me the recent lightweight pretrained model, because this is only the issue with the model mentioned above.
Many Thanks.
Hi, my implementation of ViT is based on the timm codebase. You might want to try the pretrained weights from timm.
Hi
Thanks for your outstanding work.
I faced an issue when I wanted to load one of the pretrained vit base with this URL: 'https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth'
in the state-dict, the model does not have 'visual_encoder.pos_embed'. Hence, it produces an error. For instance, the following code is not executable:
model_url='https://dl.fbaipublicfiles.com/deit/deit_base_patch16_224-b5f2ef4d.pth' model = blip_decoder(pretrained=model_url, image_size=224, vit='base')
Would it be possible to share with me the recent lightweight pretrained model, because this is only the issue with the model mentioned above.
Many Thanks.