GPT Judge does not work for the Prompt

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Creative Commons Attribution 4.0 International

1.05k stars 92 forks source link

Thanks for sharing the great work.

We have tired to reproduce the evaluation results following https://github.com/mbzuai-oryx/Video-ChatGPT/blob/main/quantitative_evaluation/evaluate_benchmark_1_correctness.py The correctness works well for most of the cases (some not work even if we query GPT for multiple times), but the detail orientation prompt does not work for all cases.

Could you verify that the GPT 3.5 judge still work for this repo now? The openai GPT seems to be changing even if using the same model api.

mbzuai-oryx / Video-ChatGPT

GPT Judge does not work for the Prompt #89