mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.17k stars 102 forks source link

Inference Issue: 'mmco: unref short failure' and Question Key Retrieval Problem #61

Closed Kratos-Wen closed 11 months ago

Kratos-Wen commented 11 months ago

Hi, thank you for the great work!

I've encountered a couple of issues while running inference following the instructions in "QuantitativeEvaluation.md":

  1. H264 decoding error: When I attempt to run the inference, the process doesn't terminate, and I repeatedly receive the following error: [h264 @ 0x35aaf840] mmco: unref short failure

  2. Question retrieval in 'run_inference_benchmark_general.py': I noticed that the script expects to find the question using a 'Q' key in the JSON file. However, in "consistency_qa.json", the questions are keyed with "Q" followed by a number (e.g., "Q1", "Q2", etc.). This discrepancy seems to prevent the correct retrieval of the questions.

Is this a known issue, or am I possibly misinterpreting the correct usage? Any guidance on these issues would be greatly appreciated. Thank you in advance for your assistance!

hanoonaR commented 11 months ago

Hi @Kratos-Wen,

Thank you for your interest in our work. 1) I would suggest ignoring them, as they are warnings. (https://github.com/mbzuai-oryx/Video-ChatGPT/issues/28#issuecomment-1651526374)

2) No, you are correct. Apologies for not making this clear in the released code. Here in the case of benchmarking, we evaluate five criteria: correctness, detained orientation, context, temporal, and consistency. Except for consistency, every other criterion requires only one question per sample. However in the case of consistency, what we intend to do is when asked about a similar concept in two different perspectives, how does the model respond So as you see here in the evaluation, we make use of both questions.

Hope its clear.

Kratos-Wen commented 11 months ago

Hi @hanoonaR,

Yes, thank you very much, that answers my question very well!