open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
781 stars 90 forks source link

Does the generating params changed? #269

Open lucasjinreal opened 2 weeks ago

lucasjinreal commented 2 weeks ago

The model trends to output more words than a single option when multi choice, after upgraded vlmevalkit

junming-yang commented 2 weeks ago

@lucasjinreal Which model and dataset do you evaluate? Can you provide more information? This may help us find out the problem.

lucasjinreal commented 2 weeks ago

we are testing on same inhouse model, the data eval set was: eval_datasets='MMBench_DEV_CN MMStar MMBench_TEST_EN MMBench_TEST_CN MMBench_TEST_EN_V11 MMBench_TEST_CN_V11 MME '

Didn't do anything else, looks like the output trend to produce more output, am not sure what could be the reason.

kennymckormick commented 2 weeks ago

Hi, @lucasjinreal , If you are testing in-house model , the problem may not be related to VLMEvalKit, you can check if the python environment accidentally changes (like you accidentally changed the version of some python packages).

kennymckormick commented 2 weeks ago

AFAIK, there is no change on the dataset side that would affect the model behavior.