13416157913 commented 8 months ago

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

1

Reproduces the problem - code/configuration sample

1

Reproduces the problem - command or script

1

Reproduces the problem - error message

1

Other information

运行脚本：python run.py configs/eval_test.py --datasets mmlu_gen ceval_gen (采用API方式评测)

问题1：mmlu_gen评测日志college_chemistry中只有100道题，而mmlu本地/data目录下（zip解压）数据集college_chemistry则有116道题，少了16道题

问题2：默认采用的是5-shot评测，如何修改成0-shot评测？

问题3：mmlu有57个分类，ceval有52个分类，加起来109个分类，怎么评测进度条显示445，如下图：

Leymore commented 8 months ago

mmlu college_chemistry 就是只有 100 题，请注意该子集有一些题目是跨行的。此处的统计也是 100 题
请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py
猜测 445 是指目前正在跑的某一个子任务的进度条，不是总的进度条

13416157913 commented 8 months ago

mmlu college_chemistry 就是只有 100 题，请注意该子集有一些题目是跨行的。此处的统计也是 100 题

请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py

猜测 445 是指目前正在跑的某一个子任务的进度条，不是总的进度条

hendrycksTest-college_chemistry 就是MMLU-college_chemistry？

Leymore commented 8 months ago

对，hendrycksTest-college_chemistry 就是 MMLU-college_chemistry

13416157913 commented 8 months ago

mmlu college_chemistry 就是只有 100 题，请注意该子集有一些题目是跨行的。此处的统计也是 100 题

请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py

猜测 445 是指目前正在跑的某一个子任务的进度条，不是总的进度条

hendrycksTest-college_chemistry 就是MMLU-college_chemistry？

第3个问题，445是因为跑了所有数据集，不是已经指定了--datasets mmlu_gen ceval_gen了么？

configs/eval_test.py配置文件内容如下： from mmengine.config import read_base from opencompass.models import OpenAI from opencompass.partitioners import NaivePartitioner from opencompass.runners import LocalRunner from opencompass.tasks import OpenICLInferTask

with read_base(): from .datasets.collections.chat_medium import datasets from .summarizers.medium import summarizer

GPT4 needs a special humaneval postprocessor

from opencompass.datasets.humaneval import humaneval_gpt_postprocess

for _dataset in datasets:

if _dataset['path'] == 'openai_humaneval':

_dataset['eval_cfg']['pred_postprocessor']['type'] = humaneval_gpt_postprocess

api_meta_template = dict( round=[ dict(role='HUMAN', api_role='HUMAN'), dict(role='BOT', api_role='BOT', generate=True), ], ) models = [ dict(abbr='deng', type=OpenAI, path='deng', key='http://127.0.0.1:19201', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well meta_template=api_meta_template, query_per_second=1, max_out_len=4096, max_seq_len=4096, batch_size=8), ]

infer = dict( partitioner=dict(type=NaivePartitioner), runner=dict( type=LocalRunner, max_num_workers=1, task=dict(type=OpenICLInferTask)), )

13416157913 commented 8 months ago

对，hendrycksTest-college_chemistry 就是 MMLU-college_chemistry

ceval数据集差的更大，例如，ceval-college_chemistry数据集，评测日志文件中，只有24道题，但是数据集文件（ceval数据集的test目录）college_chemistry_test.csv有224道题，差别很大，绝对不是数据集跨行问题，想这个问题，是因为担心用Opencompass评测ceval数据集测试少了，最终得到分数和C-Eval官方测出的分数没有可比性！（因为测试题目数量不一样）。

tonysy commented 8 months ago

ceval是有val和test 两个split

tonysy commented 8 months ago

We use ceval-val as default: https://github.com/open-compass/opencompass/blob/6d04decab459f9879e843c332be601871d294fe0/configs/datasets/ceval/ceval_gen_5f30c7.py#L65

13416157913 commented 8 months ago

We use ceval-val as default:

https://github.com/open-compass/opencompass/blob/6d04decab459f9879e843c332be601871d294fe0/configs/datasets/ceval/ceval_gen_5f30c7.py#L65

问题解决了，多谢！

13416157913 commented 8 months ago

mmlu college_chemistry 就是只有 100 题，请注意该子集有一些题目是跨行的。此处的统计也是 100 题

请参考这里 https://github.com/open-compass/opencompass/blob/main/configs/eval_mmlu_with_zero_retriever_overwritten.py

猜测 445 是指目前正在跑的某一个子任务的进度条，不是总的进度条

问题已经解决，多谢！

open-compass / opencompass

[Bug] mmlu_gen评测日志college_chemistry中只有100道题，而mmlu本地/data目录下（zip解压）数据集college_chemistry则有116道题，少了16道题 #899

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

GPT4 needs a special humaneval postprocessor

from opencompass.datasets.humaneval import humaneval_gpt_postprocess

for _dataset in datasets:

if _dataset['path'] == 'openai_humaneval':

_dataset['eval_cfg']['pred_postprocessor']['type'] = humaneval_gpt_postprocess