[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Thank you for your interest in our work. We use the validation set for MSVD and MSRVTT and the test set for TGIF. A copy of the annotation files have been attached in the links.
I want to ask which dataset used for Zeroshot evaluation for (MSVD, MSRVTT, TGIF), Is it the validation dataset or the test dataset