Hi author!
I really want to know whether there are any ways to get the Similarity of image text pairs, or what prompt should I use to prompt Llava output something relative.
For example, how can I make inferences if I want to choose a picture that best matches the text description among multiple pictures?
Question
Hi author! I really want to know whether there are any ways to get the Similarity of image text pairs, or what prompt should I use to prompt Llava output something relative.
For example, how can I make inferences if I want to choose a picture that best matches the text description among multiple pictures?