open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

The performance of Cambrian-1-8B. #439

Closed Haochen-Wang409 closed 2 months ago

Haochen-Wang409 commented 2 months ago

Hello, I am using this code to evaluate the official checkpoint of Cambrian-1-8B. However, I fail to reproduce the results. Specifically, I got only 64.78 on MMBench-DEV-EN, while the reported score is 75.9 (Table 8 of the paper).

Could you help me to reproduce the results?

PhoenixZ810 commented 2 months ago

Hi,

Thank you for your findings! We have identified that the issue is due to Cambrian-8b’s instruction-following capabilities. We recommend modifying the prompt in MMBench as follows:

if len(options):
    prompt += options_prompt
    # prompt += 'Please select the correct answer from the options above. \n'
    prompt +=  '\n' + "Answer with the option's letter from the given choices directly."

With this change, the evaluation results of MMBench-Dev-EN for Cambrian-8B should improve to 76.29.

Best regards.

Haochen-Wang409 commented 2 months ago

Thanks for the reply! I have successfully reproduced the results!