open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.08k stars 154 forks source link

InternVL2 Truncated Output #466

Closed TJ-Ouyang closed 3 days ago

TJ-Ouyang commented 1 week ago

image When I use InternVL2-40B to do single image inference using personal data, the output is truncated.

kennymckormick commented 1 week ago

Hi, @TJ-Ouyang , That is strange, apparently the token number of the output is much smaller than the max_tokens set. Would you please share the content of your test.py so we may help with debugging?

TJ-Ouyang commented 1 week ago

Hi, @TJ-Ouyang , That is strange, apparently the token number of the output is much smaller than the max_tokens set. Would you please share the content of your test.py so we may help with debugging?

We have solved the problem by setting "kwargs_default = dict(do_sample=False, max_new_tokens=512, top_p=None, num_beams=1)" before self.model.chat at line 359 in internvl_chat.py. We used ipdb and found the process did not jump into chat_inner function, where max_new_tokens was set before. image

kennymckormick commented 3 days ago

@TJ-Ouyang , I have figured out the cause of this problem: the default max_new_tokens for InternVL2-40B is just 20. I have changed the default kwargs value so that other users will not meet this problem again.

kennymckormick commented 3 days ago

https://github.com/open-compass/VLMEvalKit/blob/e542caea3cf5d362131bccdb97a70ce50839c332/vlmeval/vlm/internvl_chat.py#L177