open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.39k stars 194 forks source link

[Add] Benchmark: NaturalBench (NeurIPS24) #582

Closed Baiqi-Li closed 2 weeks ago

Baiqi-Li commented 2 weeks ago

Add NaturalBench(NeruIPS24): paper dataset

Baiqi-Li commented 2 weeks ago

Hey, I have fixed the code style according to pre-commit requirements. Please review and check the codes. Thank you !

PhoenixZ810 commented 2 weeks ago

Hey, I have fixed the code style according to pre-commit requirements. Please review and check the codes. Thank you !

Hi, We evaluated InternVL2-8B on NaturalBench, and the results are slightly lower than those reported in the paper, as shown below. Could you please let us know if this discrepancy is significant? 图片

Baiqi-Li commented 2 weeks ago

@PhoenixZ810 @linzhiqiu The score is only about 1.% lower than what's reported in the paper, which I think is reasonable. We ensured that the dataset we uploaded is exactly the same as the one used in the paper. Possible reasons for the discrepancy include:

We will conduct further tests across multiple open-source platforms and consider updating the results in the paper and on the leaderboard. Thank you very much !