mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

Same output for any system prompt #103

Closed yangzhj53 closed 4 days ago

yangzhj53 commented 1 month ago

I followed the steps in the tutorial to evaluate the results of the model on MSVD-QA, but I found that regardless of how my system prompt was set, even if the model was asked to answer 'Yes', its answer was almost consistent. Is this reasonable?

微信图片_20240505204617

mmaaz60 commented 4 days ago

Hi @yangzhj53,

Thank you for your interest in our work. It looks like the model is not respecting the system/user prompt in this particular case which can be considered as one of the limitations of the model.

May be, using some language only instruction data (e.g. ShareGPT data) along with the VideoInstruct-100K improves the performance.

Please do share if you fix this trend. Thank you and good luck :)