[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
I have a small question, in this link you provide the relevant test set, but msvd and msrvtt are both val_Qa.json, is the experimental result of this work on test_qa or val_qa
I have a small question, in this link you provide the relevant test set, but msvd and msrvtt are both val_Qa.json, is the experimental result of this work on test_qa or val_qa