open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.83k stars 406 forks source link

使用Qwen2-72B-Base测试GPQA数据集时报错:NotImplementedError: OpenAI does not support ppl-based evaluation yet, try gen-based instead. #1526

Open 13416157913 opened 2 weeks ago

13416157913 commented 2 weeks ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

1

Reproduces the problem - code/configuration sample

from mmengine.config import read_base from opencompass.models import OpenAI from opencompass.partitioners import NaivePartitioner from opencompass.runners import LocalRunner from opencompass.tasks import OpenICLInferTask

with read_base(): from .datasets.collections.chat_medium import datasets from .summarizers.medium import summarizer from .datasets.gpqa.gpqa_gen import gpqa_datasets

api_meta_template = dict( round=[ dict(role='HUMAN', api_role='HUMAN'), dict(role='BOT', api_role='BOT', generate=True), ], ) datasets = [*gpqa_datasets] models = [ dict(abbr='xxxx', type=OpenAI, path='xxxx', key='http://xxx.xxx.xxx.xxx:xxxx', # The key will be obtained from $OPENAI_API_KEY, but you can write down your key here as well meta_template=api_meta_template, query_per_second=1, max_out_len=8192, max_seq_len=8192, batch_size=8), ] infer = dict( partitioner=dict(type=NaivePartitioner), runner=dict( type=LocalRunner, max_num_workers=1, task=dict(type=OpenICLInferTask)), )

Reproduces the problem - command or script

1

Reproduces the problem - error message

Traceback (most recent call last): File "/home/opencompass/opencompass/tasks/openicl_infer.py", line 152, in inferencer.run() File "/home/opencompass/opencompass/tasks/openicl_infer.py", line 81, in run self._inference() File "/home/opencompass/opencompass/tasks/openicl_infer.py", line 125, in _inference inferencer.inference(retriever, File "/home/anaconda3/lib/python3.10/site-packages/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 159, in inference sub_res = self.model.get_ppl_from_template(sub_prompt_list).tolist() File "/home/anaconda3/lib/python3.10/site-packages/opencompass/models/base.py", line 152, in get_ppl_from_template return self.get_ppl(inputs, mask_length) File "/home/anaconda3/lib/python3.10/site-packages/opencompass/models/base_api.py", line 124, in get_ppl raise NotImplementedError(f'{self.class.name} does not support' NotImplementedError: OpenAI does not support ppl-based evaluation yet, try gen-based instead.

Other information

gpqa数据集使用的配置文件为:gpqa_ppl_6bf57a.py 报错信息:NotImplementedError: OpenAI does not support ppl-based evaluation yet, try gen-based instead.

MaiziXiao commented 2 weeks ago

The errore message is straightforward. OpenAI model does not support PPL based evaluation (based on output logits), try another GPQA generation settings (only rely on input and output strings)

13416157913 commented 2 weeks ago

The errore message is straightforward. OpenAI model does not support PPL based evaluation (based on output logits), try another GPQA generation settings (only rely on input and output strings)

Hello, thanks your answer. Qwen2 model support PPL based evaluation?