Implementation of MSCOCO retrieval metric

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

MIT License

25.03k stars 3.23k forks source link

Open alex8937 opened 2 years ago

alex8937 commented 2 years ago

Can the author confirm how the recall is implemented for both text to image and image to text given there are 5 captions per image?

shyammarjit commented 5 months ago