mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.23k stars 108 forks source link

How to evaluate the MSVD-QA and MSRVTT-QA #54

Open jhj7905 opened 1 year ago

jhj7905 commented 1 year ago

@mmaaz60 @hanoonaR Hello, Thank you for sharing excellent work. Can you tell me how to evaluate the MSVD-QA and MSRVTT-QA in detail?

hanoonaR commented 1 year ago

Hi @jhj7905,

Apologies for the late reply. Kindly check the instruction given for Zero-shot ActivityNet inference here. We follow the same steps - with very minimal changes to adapt to the MSVD-QA and MSRVTT-QA datasets. The evaluation protocol, however, remains the same across all datasets. In case you have a specific question or is stuck with a problem, please let us know.

wcy1122 commented 1 year ago

Hello, may I know where to download video in MSRVTT-QA. It looks like the official website [https://ms-multimedia-challenge.com/2016/dataset] is out of maintained.

hb-jw commented 4 months ago

Hello, I've also been replicating related benchmarks recently, and these benchmarks are mostly based on GPT-assistant, which seems quite costly. I'd like to ask, approximately how much does each of your evaluations cost?