Closed xyfZzz closed 3 months ago
Can you provide a detailed config so I can help you
Can you provide a detailed config so I can help you
eval_openai.py:
from copy import deepcopy
from mmengine.config import read_base
with read_base():
from .datasets.teval.teval_en_gen_1ac254 import teval_datasets as teval_en_datasets
from .datasets.teval.teval_zh_gen_1ac254 import teval_datasets as teval_zh_datasets
# from .models.qwen.hf_qwen_7b_chat import models as hf_qwen_7b_chat_model
# from .models.hf_internlm.hf_internlm2_chat_7b import models as hf_internlm2_chat_7b_model
# from .models.hf_llama.hf_llama2_7b_chat import models as hf_llama2_7b_chat_model
from .models.openai.gpt_3_5_turbo import models as gpt_3_5_model
from .summarizers.teval import summarizer
meta_template_system_patches = {
'internlm2-chat-7b-hf': dict(role='SYSTEM', begin='<|im_start|>system\n', end='<|im_end|>\n'),
'internlm2-chat-20b-hf': dict(role='SYSTEM', begin='<|im_start|>system\n', end='<|im_end|>\n'),
}
_origin_models = sum([v for k, v in locals().items() if k.endswith("_model")], [])
models = []
for m in _origin_models:
m = deepcopy(m)
if 'meta_template' in m and 'round' in m['meta_template']:
round = m['meta_template']['round']
if all(r['role'].upper() != 'SYSTEM' for r in round): # no system round
if m['abbr'] in meta_template_system_patches:
system_round = meta_template_system_patches[m['abbr']]
else:
system_round = [r for r in round if r['role'].upper() == 'HUMAN'][0]
system_round = deepcopy(system_round)
system_round['role'] = 'SYSTEM'
m['meta_template']['round'].append(system_round)
else:
raise ValueError(f'no meta_template.round in {m.get("abbr", None)}')
print(f'model {m["abbr"]} is using the following meta_template: {m["meta_template"]}')
models.append(m)
datasets = teval_en_datasets + teval_zh_datasets
work_dir = './outputs/teval'
然后在openai脚本的generate函数打印了inputs的长度:
def generate(
self,
inputs: List[str or PromptList],
max_out_len: int = 512,
temperature: float = 0.7,
) -> List[str]:
"""Generate results given a list of inputs.
Args:
inputs (List[str or PromptList]): A list of strings or PromptDicts.
The PromptDict should be organized in OpenCompass'
API format.
max_out_len (int): The maximum length of the output.
temperature (float): What sampling temperature to use,
between 0 and 2. Higher values like 0.8 will make the output
more random, while lower values like 0.2 will make it more
focused and deterministic. Defaults to 0.7.
Returns:
List[str]: A list of generated strings.
"""
if self.temperature is not None:
temperature = self.temperature
print("openai len(inputs): ", len(inputs))
with ThreadPoolExecutor() as executor:
results = list(
executor.map(self._generate, inputs,
[max_out_len] * len(inputs),
[temperature] * len(inputs)))
return results
发现长度始终都是1,那这个多线程始终没起作用。
you can try to set a bigger max_num_workers
in your runner
you can try to set a bigger
max_num_workers
in your runner
ok
you can try to set a bigger
max_num_workers
in your runner
@bittersweet1999 在调用完api模型后,报了个错,显示gpu数目不对,但是我用的api模型为什么还要gpu?:
···
100%|██████████| 22/22 [6:20:04<00:00, 1036.57s/it]
0%| | 0/16 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/xie/code/zone/github/new/opencompass_backup/run.py", line 357, in
In the evaluation stage of t-eval, it is recommended to utilize a GPU for efficient loading of transformers to compare the model's predictions against the gold standard answers, if you don't want to use GPU, just change here https://github.com/open-compass/opencompass/blob/3098d788455dc785e6830f8c69eb9d1010c0cce1/configs/datasets/teval/teval_en_gen_1ac254.py#L39 However, be aware that performance will be significantly slower if you opt to process the evaluation using a CPU instead.
In the evaluation stage of t-eval, it is recommended to utilize a GPU for efficient loading of transformers to compare the model's predictions against the gold standard answers, if you don't want to use GPU, just change here
However, be aware that performance will be significantly slower if you opt to process the evaluation using a CPU instead.
是说在做结果比对的时候还需要用gpu加载模型来进行结果对比吗?进行结果对比用的是什么模型?能改成用api模型来做结果对比吗?
In the evaluation stage of t-eval, it is recommended to utilize a GPU for efficient loading of transformers to compare the model's predictions against the gold standard answers, if you don't want to use GPU, just change here https://github.com/open-compass/opencompass/blob/3098d788455dc785e6830f8c69eb9d1010c0cce1/configs/datasets/teval/teval_en_gen_1ac254.py#L39
However, be aware that performance will be significantly slower if you opt to process the evaluation using a CPU instead.
是说在做结果比对的时候还需要用gpu加载模型来进行结果对比吗?进行结果对比用的是什么模型?能改成用api模型来做结果对比吗?
Yes, a GPU is required for evaluation during the assessment phase; this step cannot be performed with an API model.
feel free to reopen it if needed
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
在使用api模型进行评测时,batch size没有生效,始终是同时只有一条请求在进行推理
Reproduces the problem - code/configuration sample
在使用api模型进行评测时,batch size没有生效,始终是同时只有一条请求在进行推理
Reproduces the problem - command or script
在使用api模型进行评测时,batch size没有生效,始终是同时只有一条请求在进行推理
Reproduces the problem - error message
在使用api模型进行评测时,batch size没有生效,始终是同时只有一条请求在进行推理
Other information
在使用api模型进行评测时,batch size没有生效,始终是同时只有一条请求在进行推理