Open charliedream1 opened 5 days ago
cmmu在最后的报告也没有打印出来,flores_100这个名字写什么,换了几个名字,最后报告里,这个结果都是空的
另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?
https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html
翻译
另外,输出报告,建议在最后能加一个平均值
# Copyright (c) Alibaba, Inc. and its affiliates.
"""
1. Installation
EvalScope: pip install evalscope[opencompass]
2. Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip
3. Deploy model serving
swift deploy --model_type qwen2-1_5b-instruct
4. Run eval task
"""
from evalscope.backend.opencompass import OpenCompassBackendManager
from evalscope.run import run_task
from evalscope.summarizer import Summarizer
def run_swift_eval():
# List all datasets
# e.g. ['mmlu', 'WSC', 'DRCD', 'chid', 'gsm8k', 'AX_g', 'BoolQ', 'cmnli', 'ARC_e', 'ocnli_fc', 'summedits', 'MultiRC', 'GaokaoBench', 'obqa', 'math', 'agieval', 'hellaswag', 'RTE', 'race', 'ocnli', 'strategyqa', 'triviaqa', 'WiC', 'COPA', 'piqa', 'nq', 'mbpp', 'csl', 'Xsum', 'CB', 'tnews', 'ARC_c', 'afqmc', 'eprstmt', 'ReCoRD', 'bbh', 'CMRC', 'AX_b', 'siqa', 'storycloze', 'humaneval', 'cluewsc', 'winogrande', 'lambada', 'ceval', 'bustm', 'C3', 'lcsts']
print(
f"** All datasets from OpenCompass backend: {OpenCompassBackendManager.list_datasets()}"
)
# Prepare the config
"""
Attributes:
`eval_backend`: Default to 'OpenCompass'
`datasets`: list, refer to `OpenCompassBackendManager.list_datasets()`
`models`: list of dict, each dict must contain `path` and `openai_api_base`
`path`: reuse the value of '--model_type' in the command line `swift deploy`
`openai_api_base`: the base URL of swift model serving
`work_dir`: str, the directory to save the evaluation results、logs and summaries. Default to 'outputs/default'
Refer to `opencompass.cli.arguments.ApiModelConfig` for other optional attributes.
"""
# Option 1: Use dict format
# Args:
# path: The path of the model, it means the `model_type` for swift, e.g. 'llama3-8b-instruct'
# is_chat: True for chat model, False for base model
# key: The OpenAI api-key of the model api, default to 'EMPTY'
# openai_api_base: The base URL of the OpenAI API, it means the swift model serving URL.
task_cfg = dict(
eval_backend="OpenCompass",
eval_config={
"datasets": ["Xsum", "triviaqa", "cmmlu",
"OpenBookQA", "GaokaoBench", "flores_100",
"tnews", 'WSC', "hellaswag",
"ceval", "mmlu", "math", "gsm8k",
"humaneval", "mbpp", "bbh"],
"models": [
{
"path": "qwen2-7b-instruct", # Please make sure the model is deployed
"openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
"is_chat": True,
"batch_size": 16,
},
],
"work_dir": "outputs/qwen2_eval_result",
"limit": 10,
},
)
# Option 2: Use yaml file
# task_cfg = 'examples/tasks/default_eval_swift_openai_api.yaml'
# Option 3: Use json file
# task_cfg = 'examples/tasks/default_eval_swift_openai_api.json'
# Run task
run_task(task_cfg=task_cfg)
# [Optional] Get the final report with summarizer
print(">> Start to get the report with summarizer ...")
report_list = Summarizer.get_report_from_cfg(task_cfg)
print(f"\n>>The report list: {report_list}")
if __name__ == "__main__":
run_swift_eval()
代码这么写的
cmmlu加上了,已经merge到main;可以先源码安装: pip install git+https://github.com/modelscope/evalscope.git@main
好,谢谢。其它集子是也还没加入是吗?
---原始邮件--- 发件人: @.> 发送时间: 2024年11月12日(周二) 下午5:48 收件人: @.>; 抄送: "Optimus @.**@.>; 主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191)
cmmlu加上了,已经merge到main;可以先源码安装: pip install @.***
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?
https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html
翻译
- Flores
- IWSLT2017
这俩数据集暂时还没支持,supported_dataset中一些数据集名称跟目前实际已支持的数据集有些diff; 后续文档很快会对齐上。
好,期待尽快合入
---原始邮件--- 发件人: @.> 发送时间: 2024年11月12日(周二) 下午5:56 收件人: @.>; 抄送: "Optimus @.**@.>; 主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191)
另外,想测这个页面里的翻译这2个,都没法测。名字该写什么?
https://evalscope.readthedocs.io/zh-cn/latest/get_started/supported_dataset.html
翻译
Flores
IWSLT2017
这俩数据集暂时还没支持,supported_dataset中一些数据集名称跟目前实际已支持的数据集有些diff; 后续文档很快会对齐上。
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
好,谢谢。其它集子是也还没加入是吗? … ---原始邮件--- 发件人: @.> 发送时间: 2024年11月12日(周二) 下午5:48 收件人: @.>; 抄送: "Optimus @.**@.>; 主题: Re: [modelscope/evalscope] OpenCompass打印的测试集列表和网址给出的不一致,导致无法测试,比如cmmlu (Issue #191) cmmlu加上了,已经merge到main;可以先源码安装: pip install @. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.>
可以提需求,哪些需要支持,我们高优接入一下哈(目前复用OpenCompass的benchmark需要针对最新模型复现和验证一下对齐效果)
对于不在OpenCompassBackendManager.list_datasets()里的数据,提示不支持,但咱们文档里给出的opencompass写的又是支持的,对于这些测试集该怎么测试?