Closed eyuansu62 closed 11 months ago
Thanks for your interests. in our paper! Our benchmark is a diagnostic suite to analyze the hallucination of LVLM. If the answer is only yes or no without furthermore explanations, it will be hard to classify whether it's language hallucination or visual illusion. Sometimes even though GPT-4v generates yes at the beginning of the sentence, the semantic meaning of the left part of the sentences is negative. Therefore, only using keyword matching algorithms will not work. GPT4 can solve this challenge.
Based on the paper, the output of LVM is {yes, no, unknown}.