showlab / Image2Paragraph

[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Apache License 2.0
789 stars 53 forks source link

Retrieval Result on COCO #3

Closed hfutzzw closed 1 year ago

hfutzzw commented 1 year ago

Hi, thanks for your interesting work. Could you explain why better retrieval result on COCO achieved by the Image2Paragraph method more clearly?

FingerRec commented 1 year ago

hi hfutzzw: Of course. We first generate a paragraph for each image in COCO dataset. Then we use zero-shot retrieval based on a conventional BERT model and frozen the parameter (without image encoder).

At last, we normalise the output of paragraph and original caption. Compute the similarity score and select the top-K samples.

FingerRec commented 1 year ago

Marked as resolved for no response.