[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
I'm very interested in your work, but I have some questions about the details of the experiment. In the following table, is the Video Chat version 13b or 7b? Also, is Video LLama version 13b or 7b? Looking forward to your reply.
Thank you for sharing the good work!
I'm very interested in your work, but I have some questions about the details of the experiment. In the following table, is the Video Chat version 13b or 7b? Also, is Video LLama version 13b or 7b? Looking forward to your reply.