mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

GPT Judge does not work for the Prompt #89

Closed Jeff-Zilence closed 4 months ago

Jeff-Zilence commented 4 months ago

Thanks for sharing the great work.

We have tired to reproduce the evaluation results following https://github.com/mbzuai-oryx/Video-ChatGPT/blob/main/quantitative_evaluation/evaluate_benchmark_1_correctness.py The correctness works well for most of the cases (some not work even if we query GPT for multiple times), but the detail orientation prompt does not work for all cases.

Could you verify that the GPT 3.5 judge still work for this repo now? The openai GPT seems to be changing even if using the same model api.