haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.23k stars 2.23k forks source link

hallucinations #112

Open Bensolz opened 1 year ago

Bensolz commented 1 year ago

feature

Hello just wanted to say that the model works great in general although it seems to have an issue with visual and textual hallucinations for example if i ask the model what color is the car in the image is in an image that has no car in it just hallucinates something up similar things happen with text only requests where the model hallucinates facts all the time and the strange thing is is that it does this more often than vicuna the model it is based on and fixing these issues would make the model a lot more reliable and could possibly be done though better instruction fine tuning on examples like i listed above

Anyway thanks for reading my request i think llava has great potential to become an open source gpt 4 and eventually maybe even surpass it

haotian-liu commented 1 year ago

Hi @Bensolz, thanks for the interest in our work and for the great feedback. The hallucination is definitely one of the most important weaknesses that we are striving to tackle with. It is harder to tackle due to the existence of a separate vision encoder. Please stay tuned for future updates, thanks!

Bensolz commented 1 year ago

thanks for the reply interestingly i noticed that mini gpt 4 seems to be less vulnerable to this and that uses Blip-2 s vision encoder so maybe just using a better vision encoder and the latest checkpoint of vicuna could mitigate the issues

xdevfaheem commented 1 year ago

No Fine Tune Can Fix Hallucination. We need Architecture Redesign.