[benchmark] Some questions about the details to generate files in step 1 during `Video-based Generative Performance Benchmarking`.

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Creative Commons Attribution 4.0 International

1.23k stars 108 forks source link

Hi @Aopolin-Lv,

Apologies for the late reply. Note that the question-answer pairs (gt_file) are the same for correctness, detailed orientation and Contextual understanding.

So, In your case, you want to evaluate the correctness and detailed orientation criteria: If you run the first step using the below command: the predictions are stored in --output_dir <output-dir-path>.

python video_chatgpt/eval/run_inference_benchmark_general.py \ --video_dir <path-to-directory-containing-videos> \ --gt_file <ground-truth-file-containing-question-answer-pairs> \ --output_dir <output-dir-path> \ --output_name <output-file-name> \ --model-name <path-to-LLaVA-Lightening-7B-v1-1> \ --projection_path <path-to-Video-ChatGPT-weights>

In order to evaluate the criteria, you will need to pass the same '--output_dir ' as the pred_path here.

Hope its clear now.

mbzuai-oryx / Video-ChatGPT

[benchmark] Some questions about the details to generate files in step 1 during `Video-based Generative Performance Benchmarking`. #47