haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.36k stars 2.25k forks source link

[Question] Image-text match #1774

Open hazardout opened 4 days ago

hazardout commented 4 days ago

Question

Hi author! I really want to know whether there are any ways to get the Similarity of image text pairs, or what prompt should I use to prompt Llava output something relative.

For example, how can I make inferences if I want to choose a picture that best matches the text description among multiple pictures?