openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
MIT License
25.03k stars 3.23k forks source link

Implementation of MSCOCO retrieval metric #181

Open alex8937 opened 2 years ago

alex8937 commented 2 years ago

Can the author confirm how the recall is implemented for both text to image and image to text given there are 5 captions per image?

shyammarjit commented 5 months ago

Please check this: https://github.com/openai/CLIP/issues/115