open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

InternVL2 Truncated Output #466

Closed TJ-Ouyang closed 1 month ago

TJ-Ouyang commented 2 months ago

image When I use InternVL2-40B to do single image inference using personal data, the output is truncated.

kennymckormick commented 2 months ago

Hi, @TJ-Ouyang , That is strange, apparently the token number of the output is much smaller than the max_tokens set. Would you please share the content of your test.py so we may help with debugging?

TJ-Ouyang commented 2 months ago

Hi, @TJ-Ouyang , That is strange, apparently the token number of the output is much smaller than the max_tokens set. Would you please share the content of your test.py so we may help with debugging?

We have solved the problem by setting "kwargs_default = dict(do_sample=False, max_new_tokens=512, top_p=None, num_beams=1)" before self.model.chat at line 359 in internvl_chat.py. We used ipdb and found the process did not jump into chat_inner function, where max_new_tokens was set before. image

kennymckormick commented 1 month ago

@TJ-Ouyang , I have figured out the cause of this problem: the default max_new_tokens for InternVL2-40B is just 20. I have changed the default kwargs value so that other users will not meet this problem again.

kennymckormick commented 1 month ago

https://github.com/open-compass/VLMEvalKit/blob/e542caea3cf5d362131bccdb97a70ce50839c332/vlmeval/vlm/internvl_chat.py#L177