open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1k stars 142 forks source link

Inconsistent max_new_tokens #356

Open amitbcp opened 1 month ago

amitbcp commented 1 month ago

Across different LMMs the max new token is different . I believe we should have a consistent MAX_NEW_TOKENS across the project, set to 512 or 1024

If it makes sense, I can create a PR to modify all of them

kennymckormick commented 1 month ago

Hi, @amitbcp , is there any specific cases you are talking about? I think for most VLMs we adopt a MAX_NEW_TOKENS >= 512.

amitbcp commented 1 month ago

@kennymckormick:

For example :

  1. in MiniCPM we have defined max length which we haven't done for models https://github.com/open-compass/VLMEvalKit/blob/22991ca6109c5d4e65bc4a1a9273234d23c3e13f/vlmeval/vlm/minicpm_v.py#L186

  2. For Phi3 its 500 https://github.com/open-compass/VLMEvalKit/blob/22991ca6109c5d4e65bc4a1a9273234d23c3e13f/vlmeval/vlm/phi3_vision.py#L36

  3. For Qwen also we adjust the tokens https://github.com/open-compass/VLMEvalKit/blob/22991ca6109c5d4e65bc4a1a9273234d23c3e13f/vlmeval/vlm/qwen_vl.py#L38

  4. For BunnyLlama https://github.com/open-compass/VLMEvalKit/blob/22991ca6109c5d4e65bc4a1a9273234d23c3e13f/vlmeval/vlm/bunnyllama3.py#L131

  5. For CogVLM we only use 2048 : https://github.com/open-compass/VLMEvalKit/blob/22991ca6109c5d4e65bc4a1a9273234d23c3e13f/vlmeval/vlm/cogvlm.py#L24

and more.

So should we set a consistent length for all models to have them perform equally on similar bases ?