mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

How to evaluate the MSVD-QA and MSRVTT-QA #54

Open jhj7905 opened 9 months ago

jhj7905 commented 9 months ago

@mmaaz60 @hanoonaR Hello, Thank you for sharing excellent work. Can you tell me how to evaluate the MSVD-QA and MSRVTT-QA in detail?

hanoonaR commented 8 months ago

Hi @jhj7905,

Apologies for the late reply. Kindly check the instruction given for Zero-shot ActivityNet inference here. We follow the same steps - with very minimal changes to adapt to the MSVD-QA and MSRVTT-QA datasets. The evaluation protocol, however, remains the same across all datasets. In case you have a specific question or is stuck with a problem, please let us know.

wcy1122 commented 8 months ago

Hello, may I know where to download video in MSRVTT-QA. It looks like the official website [https://ms-multimedia-challenge.com/2016/dataset] is out of maintained.