Does it work well on videos?

shikiw / OPERA

[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

MIT License

244 stars 22 forks source link

Does it work well on videos? #1

Closed YajieW99 closed 5 months ago

YajieW99 commented 9 months ago

Nice work! Have you tried it on videos? Or does it still work well on video-text models, such as VideoChat, Video-LLaMA and Video-ChatGPT.

shikiw commented 9 months ago

Thanks for your appreciation!

Actually I have not tried it in video-text models yet, and I'm not sure whether models like Video-LLaMA still show "over-trust" patterns when generating hallucinations.

But I think this is a valuable point well worth exploring, and we will delve into it in the future work :)