open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.18k stars 446 forks source link

[Feature] Add P-MMEval #1714

Open wanyu2018umac opened 1 day ago

wanyu2018umac commented 1 day ago

Motivation

This PR introduces the implementation of P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs (see paper link). The P-MMEval benchmark delivers support for evaluating LLMs on multilingual capabilities with examples in 10 languages.

Modification

Checklist

Before PR:

After PR: