openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
24.55k stars 3.2k forks source link

How do I fine-tune/train clip on mnist? #411

Open GitOutOfMyBed opened 8 months ago

GitOutOfMyBed commented 8 months ago

Does anyone know how to fine-tune clip on mnist? If I pass in a 32 batch of images and 10 unique labels, I don't know what the loss function would be. Because in Clip's approach for each image, there's a unique text-image pairing. Now for my scenario, each image has a unique label but I do not have a unique image for each label.