open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.73k stars 398 forks source link

[Bug] mmbench dataset category error #388

Closed jodie2235337 closed 7 months ago

jodie2235337 commented 1 year ago

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

NA

重现问题 - 代码/配置示例

EN:

  1. Download MMBench dev and test set from https://opencompass.org.cn/MMBench.
  2. Some cases' category is inaccurate, such as index=498, 500, 506,... in dev set Is it designed for some special purpose?

CN:

  1. https://opencompass.org.cn/MMBench 地址下载MMBench dev和test集
  2. 查看category数据,发现部分数据的分类不准确,例如index=498, 500, 506等,感觉不是celebrity_recognition类别 请问是设计如此吗?

重现问题 - 命令或脚本

NA

重现问题 - 错误信息

NA

其他信息

No response

tonysy commented 1 year ago

Thanks for the report, we will review the datasets.

jodie2235337 commented 11 months ago

@tonysy Is there any progress on this issue? Thank you for response