mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

About zero-shot test on TGIF-QA #84

Closed msra-jqxu closed 5 months ago

msra-jqxu commented 5 months ago

Hi @mmaaz60 , I noticed that there are four tasks on the testset of the TGIF-QA dataset: Repetition count, Repeating action, State transition and Frame QA. Are the results of TGIF-QA(51.40/3.0) in your experiments obtained by testing on these four tasks (a total of 25,751 QA pairs)? Or was it just tested on Frame QA? Thanks! The below table is the statistics of TGIF-QA. image

mmaaz60 commented 5 months ago

Hi @msra-jqxu,

Thank you for your interest in our work. We used TGIF Frame QA only in our experiments for all the compared methods. Thank you.