[Add] Benchmark: NaturalBench (NeurIPS24)

Baiqi-Li commented 2 weeks ago

Add NaturalBench(NeruIPS24): paper dataset

Baiqi-Li commented 2 weeks ago

Hey, I have fixed the code style according to pre-commit requirements. Please review and check the codes. Thank you !

PhoenixZ810 commented 2 weeks ago

Hey, I have fixed the code style according to pre-commit requirements. Please review and check the codes. Thank you !

Hi, We evaluated InternVL2-8B on NaturalBench, and the results are slightly lower than those reported in the paper, as shown below. Could you please let us know if this discrepancy is significant?

Baiqi-Li commented 2 weeks ago

@PhoenixZ810 @linzhiqiu The score is only about 1.% lower than what's reported in the paper, which I think is reasonable. We ensured that the dataset we uploaded is exactly the same as the one used in the paper. Possible reasons for the discrepancy include:

Differences in hyperparameter settings as we used our own codebase for evaluating.
The inherent randomness in the model's output.

We will conduct further tests across multiple open-source platforms and consider updating the results in the paper and on the leaderboard. Thank you very much !

open-compass / VLMEvalKit

[Add] Benchmark: NaturalBench (NeurIPS24) #582