open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.06k stars 428 forks source link

[Feature] 多模态榜单 #502

Closed lj163ucas closed 8 months ago

lj163ucas commented 1 year ago

Describe the feature

之前多模态榜单上好像有好几个bench一起评测(类似LLM榜单上好几个数据集综合评测),现在只有MMBench了。请问其他数据集评测是迁移到哪里了吗?有没有可能恢复回来呀?

Will you implement it?

kennymckormick commented 1 year ago

Hi, @lj163ucas , We are still evaluating those VLMs on MME and SEEDBench, the results will be available in 1 / 2 weeks. The data on the previous page are just mock data.

lj163ucas commented 1 year ago

赞,多谢

lj163ucas commented 12 months ago

Is it going well?

kennymckormick commented 11 months ago

Hi, @lj163ucas , the evaluation of those benchmarks are now supported in VLMEvalKit (including the evaluation results). Besides, we would add a multi-modal leaderboard in the following several days to our official website.

cuidongli commented 9 months ago

https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/multimodal_eval.html中的多模态评测使用的是opencompass中的python run.py configs/multimodal/tasks.py --mm-eval,这部分支持测试吗?目前测试报错,榜单中提到使用的是VLMEvalKit

tonysy commented 8 months ago

https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/multimodal_eval.html中的多模态评测使用的是opencompass中的python run.py configs/multimodal/tasks.py --mm-eval,这部分支持测试吗?目前测试报错,榜单中提到使用的是VLMEvalKit

Please try VLMEvalKit, evaluation for VLM has been deprecated in opencompass repo