moein-shariatnia / OpenAI-CLIP

Simple implementation of OpenAI CLIP model in PyTorch.
MIT License
574 stars 85 forks source link

Duplicate images in training? #11

Closed rahul-ohlan closed 11 months ago

rahul-ohlan commented 11 months ago

Hello Moein,

Thanks so much for the code! It has been really helpful as I have been trying to implement it in some other domain.

Question: I discovered that dataset (flikr 8k) contains some 40k images in sets of 5 image-text pairs. On building the dataloaders, it's been enforced that image ids in train and validation loaders DO NOT overlap, however identical images still actually persist in both loaders which kind of increases the probabilty of same image encodings to show up in a batch while training. I am not sure, but is that okay? I may be wrong here, but IMO contrastive loss will penalise the the same image with a different text but still similar text encoding. I'll be super grateful for clarification!

Thanks!

moein-shariatnia commented 11 months ago

identical images still actually persist in both loaders which kind of increases the probabilty of same image encodings to show up in a batch while training. I am not sure, but is that okay?

Hi Rahul,

I'm glad it has been helpful to you.

You're totally right; the current dataset is not perfect for the contrastive loss function. As you mentioned, it's possible that two same pairs of images and texts appear in one batch and contrastive loss tries to pull their embeddings apart, which is not what we want.

At the time of writing the tutorial, I ignored this fact and went on to actually experiment if it hurts the training. As I got promising initial results with the model, I got so happy to share it that I didn't make the time to change the dataset :) that's my fault definitely. I hope I can make the time in future to change the dataset, or at least make some constraints in the loaders in order to prevent putting the same images in one batch (this is much easier to do!).

Thanks for reminding this bug. Bests.

rahul-ohlan commented 11 months ago

Thanks for clarifying, Moein. Highly appreciated :)