mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

About the evaluation time #29

Closed wyzjack closed 11 months ago

wyzjack commented 11 months ago

Hi authors,

Thanks for the great work! I am running your evaluation code on ActivityQA dataset following https://github.com/mbzuai-oryx/Video-ChatGPT/blob/main/quantitative_evaluation/README.md and the inference time takes up to 4-5 hours on a single A100 80G GPU. I am wondering whether that is normal? Thanks.

I would appreciate it very much if you could reply.

Thanks

hanoonaR commented 11 months ago

Hi @wyzjack,

Thanks for your interest in our work and for reaching out with your question.

Indeed, the ActivityNet-QA dataset is fairly large, containing around 8000 questions, which can make the inference process quite time-consuming, especially when run on a single GPU.

In our work, to expedite the inference, we divided the workload across multiple GPUs. We accomplished this by a simple hack and splitting the contents of the JSON file into smaller chunks, each of which was processed on a separate GPU. After the inference process, we combined the results from each GPU. In our experience, running the task on 4 or 8 GPUs can substantially decrease the inference time.

I hope this information is helpful. Let me know if you have any other questions!

wyzjack commented 11 months ago

Got it, thanks!