open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

[Fix] Update prompts for InternVL2 #474

Closed czczup closed 1 month ago

czczup commented 1 month ago

以下是本次修改的说明:

  1. 由于video数据集没有dump_image函数,在use_custom_prompt对遇到的所有Video数据集都返回False;

  2. 修改了一下image数据集的prompt,把MME和HallusionBench都归类到了Y/N下面;

  3. 在set_max_num中根据max_num做了数据集名称的list,方便之后拓展新的数据集;

  4. cuda()的硬编码全部改成to(self.device);

  5. 图片的指代词改成了Image-{idx},不同图像用\n区分;

  6. 增加了MVBench的prompt;

经过测试,图像benchmark的点数没有变化,视频benchmark中Video-MME的点数没有变化,MMBench-Video的点数有轻微提升,MVBench的点数还需要进一步测试。


Here is an explanation of the changes made:

  1. The dump_image function was not available for the video dataset, so use_custom_prompt returned False for all Video datasets encountered.

  2. The prompt for image datasets was modified to categorize MME and HallusionBench under Y/N.

  3. A list of dataset names based on max_num was created in set_max_num for future expansion of new datasets.

  4. All hard-coded .cuda() calls were changed to .to(self.device).

  5. The referential term for images was changed to Image-{idx}, with different images separated by \n.

  6. The prompt for MVBench was added.

After testing, it was found that the number of points for both image and video benchmarks remained unchanged except for a slight increase in points for MMBench-Video. Further testing is needed for MVBench's point count.