mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

Question about result #48

Closed jhj7905 closed 9 months ago

jhj7905 commented 10 months ago

@hanoonaR @mmaaz60 Hello, Thank you for sharing excellent work. I have confirmed the output of model. Even though I input the same video, the results are different like below ('The person in the video is using a cellphone' or 'The person in the video is holding a cellphone') Can you tell me how to control the result? I mean I wanna get the same result Thank you in advance

mmaaz60 commented 10 months ago

Hi @jhj7905,

Thank you for your interest in our work. Video-ChatGPT uses Vicuna for generating text output. As Vicuna is a generative LLM, it is expected to have somewhat different output with similar meaning in different runs. I hope I answer your question. Thanks