[Feature] Add P-MMEval - Githubissues

Motivation

This PR introduces the implementation of P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs (see paper link). The P-MMEval benchmark delivers support for evaluating LLMs on multilingual capabilities with examples in 10 languages.

Modification

Configs:
- Add files in configs/datasets/PMMEval for evaluation support. For each subset in P-MMEval (i.e., flores, humaneval-xl, mgsm, mhellaswag, mifeval, mlogiqa, mmmlu, and xnli), each dataset python file is created.
- Add files in configs/summarizers and configs/summarizers/groups for summarizing the evaluation results on P-MMEval.
Datasets
- Add files in datasets supporting the loading and evaluation for each subset.

Checklist

Before PR:

[x] Pre-commit or other linting tools are used to fix the potential lint issues.
[ ] Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
[ ] The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
[ ] The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

[ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
[ ] CLA has been signed and all committers have signed the CLA in this PR.

open-compass / opencompass

[Feature] Add P-MMEval #1714

Motivation

Modification

Checklist