open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.73k stars 400 forks source link

[Feature] Support QuALITY dataset #949

Closed Ezra-Yu closed 6 months ago

Ezra-Yu commented 6 months ago

描述该功能

在Claude3中加入了QuALITY: Question Answering with Long Input Texts, Yes!”这个长文测试集(平均5k token),是一个人工标注且质量比较高的测试集,希望支持

Arxiv : https://arxiv.org/abs/2112.08608 Github: https://github.com/nyu-mll/quality

是否希望自己实现该功能?

jingmingzhuo commented 6 months ago

https://github.com/open-compass/opencompass/pull/976