Closed 13416157913 closed 8 months ago
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
hendrycksTest-college_chemistry 就是MMLU-college_chemistry?
对,hendrycksTest-college_chemistry 就是 MMLU-college_chemistry
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
hendrycksTest-college_chemistry 就是MMLU-college_chemistry?
第3个问题,445是因为跑了所有数据集,不是已经指定了--datasets mmlu_gen ceval_gen了么?
configs/eval_test.py配置文件内容如下: from mmengine.config import read_base from opencompass.models import OpenAI from opencompass.partitioners import NaivePartitioner from opencompass.runners import LocalRunner from opencompass.tasks import OpenICLInferTask
with read_base(): from .datasets.collections.chat_medium import datasets from .summarizers.medium import summarizer
api_meta_template = dict( round=[ dict(role='HUMAN', api_role='HUMAN'), dict(role='BOT', api_role='BOT', generate=True), ], ) models = [ dict(abbr='deng', type=OpenAI, path='deng', key='http://127.0.0.1:19201', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well meta_template=api_meta_template, query_per_second=1, max_out_len=4096, max_seq_len=4096, batch_size=8), ]
infer = dict( partitioner=dict(type=NaivePartitioner), runner=dict( type=LocalRunner, max_num_workers=1, task=dict(type=OpenICLInferTask)), )
对,hendrycksTest-college_chemistry 就是 MMLU-college_chemistry
ceval数据集差的更大,例如,ceval-college_chemistry数据集,评测日志文件中,只有24道题,但是数据集文件(ceval数据集的test目录)college_chemistry_test.csv有224道题,差别很大,绝对不是数据集跨行问题,想这个问题,是因为担心用Opencompass评测ceval数据集测试少了,最终得到分数和C-Eval官方测出的分数没有可比性!(因为测试题目数量不一样)。
ceval是有val和test 两个split
We use ceval-val as default:
问题解决了,多谢!
- mmlu college_chemistry 就是只有 100 题,请注意该子集有一些题目是跨行的。此处 的统计也是 100 题
- 请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
- 猜测 445 是指目前正在跑的某一个子任务的进度条,不是总的进度条
问题已经解决,多谢!
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
1
Reproduces the problem - code/configuration sample
1
Reproduces the problem - command or script
1
Reproduces the problem - error message
1
Other information
运行脚本:python run.py configs/eval_test.py --datasets mmlu_gen ceval_gen (采用API方式评测)
问题1:mmlu_gen评测日志college_chemistry中只有100道题,而mmlu本地/data目录下(zip解压)数据集college_chemistry则有116道题,少了16道题
问题2:默认采用的是5-shot评测,如何修改成0-shot评测?
问题3:mmlu有57个分类,ceval有52个分类,加起来109个分类,怎么评测进度条显示445,如下图: