open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

[Model] add support for Llama-3.2-11B/90B-Vision-Instruct #490

Closed FangXinyu-0913 closed 1 month ago

uyzhang commented 1 month ago

Why is there a significant difference between the scores obtained using the llama-3.2-11b-instruct model here and the scores reported by the Hugging Face official benchmark? For example, in the AI2D benchmark, the official reported score is 91.1 (https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct), but using this code, I only obtained around 75.

FangXinyu-0913 commented 1 month ago

Hi @uyzhang. Thank you very much for pointing this out, because of time we just simply added the model and didn't make any further changes to the hyperparameters and system prompt based on the evaluation details. We will align the result when we have time, thanks for the reminder.