open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.27k stars 181 forks source link

Qwen-VL-Max-0809 MME中celebrity测出来和榜单结果差距有10分左右 #571

Open lihua8848 opened 2 hours ago

lihua8848 commented 2 hours ago

测试代码都是VLMEvalKit,我只改了api为qwen-vl-max-0809,以及测MME的celebrity,prompt这些都没动,计算scores的方法也没动,为什么和榜单上的差异这么大

image image image
lihua8848 commented 2 hours ago

第三张图是前几天那个榜单上的,请问1101更新为什么消失了?