Confidence score in image captioning

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BSD 3-Clause "New" or "Revised" License

4.85k stars 648 forks source link

Confidence score in image captioning #61

Closed helleuch closed 2 years ago

helleuch commented 2 years ago

Hello, I am using BLIP for image captioning. And I would like to retrieve a confidence score about the generated caption. Is there a way to do this ?

LiJunnan1992 commented 2 years ago

The decoder cannot provide a confidence score directly. You can use the encoder to compute the image-text matching score, which gives a measurement of how well the caption describes the image

helleuch commented 2 years ago

Thank you for your answer ! I indeed did what you suggested.

shrijayan commented 1 year ago

The decoder cannot provide a confidence score directly. You can use the encoder to compute the image-text matching score, which gives a measurement of how well the caption describes the image

I am using BLIP Visual Question answering is there any possibility of calculating the confidence score for this?

Or please describe still more if possible with an example to use the encoder to compute the image-text matching score, which gives a measurement of how well the caption describes the image