Closed jhj7905 closed 9 months ago
Hi @jhj7905,
Thank you for your interest in our work. Video-ChatGPT uses Vicuna for generating text output. As Vicuna is a generative LLM, it is expected to have somewhat different output with similar meaning in different runs. I hope I answer your question. Thanks
@hanoonaR @mmaaz60 Hello, Thank you for sharing excellent work. I have confirmed the output of model. Even though I input the same video, the results are different like below ('The person in the video is using a cellphone' or 'The person in the video is holding a cellphone') Can you tell me how to control the result? I mean I wanna get the same result Thank you in advance