openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
23.89k stars 3.15k forks source link

Comparing image-text pairs #401

Open havardox opened 8 months ago

havardox commented 8 months ago

I'm exploring CLIP for similar product retrieval by combining a product's description and image as input. As I understand, CLIP excels at image-to-text and text-to-image retrieval tasks, but I'm curious about its capability to handle integrated text and image inputs. Is this possible with CLIP and does anyone have examples?

Suasy commented 8 months ago

“Google image search by image & text” can do this

shyammarjit commented 2 months ago

Please check this: https://github.com/openai/CLIP/issues/115.