openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

Question about Checkpoint on ResNet-50. #445

Open MorningStarOvO opened 3 months ago

MorningStarOvO commented 3 months ago

Have the parameters of ResNet-50 been changed in three years?

It's hard to reproduce the results of the original paper with the downloaded parameters on ResNet-50 now, but it's amazing that they are reproduced with the ViT backbone.

Using the latest version of the CLIP code now, it's not clear what's causing this. The results in the ViT architecture are almost comparable to the original paper, but ResNet50 tested more than 10 datasets and they were much worse!