Open lucasjinreal opened 2 weeks ago
@lucasjinreal Which model and dataset do you evaluate? Can you provide more information? This may help us find out the problem.
we are testing on same inhouse model, the data eval set was: eval_datasets='MMBench_DEV_CN MMStar MMBench_TEST_EN MMBench_TEST_CN MMBench_TEST_EN_V11 MMBench_TEST_CN_V11 MME '
Didn't do anything else, looks like the output trend to produce more output, am not sure what could be the reason.
Hi, @lucasjinreal , If you are testing in-house model , the problem may not be related to VLMEvalKit, you can check if the python environment accidentally changes (like you accidentally changed the version of some python packages).
AFAIK, there is no change on the dataset side that would affect the model behavior.
The model trends to output more words than a single option when multi choice, after upgraded vlmevalkit