Open Bensolz opened 1 year ago
Hi @Bensolz, thanks for the interest in our work and for the great feedback. The hallucination is definitely one of the most important weaknesses that we are striving to tackle with. It is harder to tackle due to the existence of a separate vision encoder. Please stay tuned for future updates, thanks!
thanks for the reply interestingly i noticed that mini gpt 4 seems to be less vulnerable to this and that uses Blip-2 s vision encoder so maybe just using a better vision encoder and the latest checkpoint of vicuna could mitigate the issues
No Fine Tune Can Fix Hallucination. We need Architecture Redesign.
feature
Hello just wanted to say that the model works great in general although it seems to have an issue with visual and textual hallucinations for example if i ask the model what color is the car in the image is in an image that has no car in it just hallucinates something up similar things happen with text only requests where the model hallucinates facts all the time and the strange thing is is that it does this more often than vicuna the model it is based on and fixing these issues would make the model a lot more reliable and could possibly be done though better instruction fine tuning on examples like i listed above
Anyway thanks for reading my request i think llava has great potential to become an open source gpt 4 and eventually maybe even surpass it