open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.62k stars 383 forks source link

Please Add average scores for C-Eval #208

Closed Quehry closed 1 year ago

Quehry commented 1 year ago

描述该功能

when i test LLMs on C-Eval, i focus more on average score on validation sets, please add this function. And there are also C-Eval Hard datasets, please add this datasets in C-Eval. You can refer to https://github.com/SJTU-LIT/ceval

是否希望自己实现该功能?

gaotongxiao commented 1 year ago

It has been implemented in the summarizer module. Just include this line in your config and rerun your experiment:

https://github.com/InternLM/opencompass/blob/4fc1701209adad832f7fca91640fac34331b8e29/configs/eval_internlm_7b.py#L9

gaotongxiao commented 1 year ago

@Leymore We need to have a doc for summarizer.

Quehry commented 1 year ago

@Leymore We need to have a doc for summarizer.

ok, thanks a lot!