Copyright (c) Alibaba, Inc. and its affiliates.

import torch from evalscope.run import run_task from evalscope.config import TaskConfig, registry_tasks

def run_swift_eval():

Prepare the config

# 2. 配置自定义数据集文件
TaskConfig.registry(
    name='custom_dataset',      # 任务名称
    data_pattern='general_qa',  # 数据格式
    dataset_dir='/home/quiana/work/evalscope/medical/custom_qa',       # 数据集路径
    subset_list=['example']     # 评测数据集名称，上述 example.jsonl
)

# 3. 配置任务，通过任务名称获取配置
task_cfg = registry_tasks['custom_dataset']

# 4. 配置模型和其他配置
task_cfg.update({
    'model_args': {'revision': None, 'precision': torch.float16, 'device_map': 'auto'},
    'eval_type': 'checkpoint',                 # 评测类型，需保留，固定为checkpoint
    'model': '/home/quiana/Downloads/Qwen2-1.5B', # 模型路径
    'template_type': 'qwen',                  # 模型模板类型
    'outputs': 'outputs',
    'mem_cache': False,
    'limit': 10,
    })

# Run task
run_task(task_cfg=task_cfg)

if name == 'main':
run_swift_eval()

数据集格式： {"query": "中国的首都是哪里？", "response": "中国的首都是北京"} {"query": "世界上最高的山是哪座山？", "response": "是珠穆朗玛峰"} {"query": "为什么北极见不到企鹅？", "response": "因为企鹅大多生活在南极"}

结果为： (base) quiana@quiana-LEGION-REN9000K-34IAZ:~/work/evalscope$ /usr/bin/python3 /home/quiana/work/evalscope/examples/example_eval_custom_llm_data.py 2024-09-02 13:52:11,644 - evalscope - INFO - Registered task: custom_dataset with data pattern: general_qa 2024-09-02 13:52:11,644 - evalscope - INFO - Args: Task config is provided with dictionary type. 2024-09-02 13:52:11,644 - evalscope - INFO - {'model_args': {'revision': None, 'precision': torch.float16, 'device_map': 'auto'}, 'generation_config': {'temperature': 0.3, 'max_length': 2048, 'max_new_tokens': 512, 'top_k': 50, 'top_p': 0.85, 'do_sample': True, 'num_beams': 1, 'repetition_penalty': 1.0}, 'dataset_args': {'general_qa': {'local_path': '/home/quiana/work/evalscope/medical/custom_qa', 'subset_list': ['example']}}, 'dry_run': False, 'model': '/home/quiana/Downloads/Qwen2-1.5B', 'eval_type': 'checkpoint', 'datasets': ['general_qa'], 'outputs': 'outputs', 'use_cache': False, 'stage': 'all', 'dataset_hub': 'Local', 'limit': 10, 'template_type': 'qwen', 'mem_cache': False, 'eval_backend': 'Native'} 2024-09-02 13:52:11,644 - evalscope - INFO - Set use_cache to False. 2024-09-02 13:52:12,038 - evalscope - WARNING - Device: cuda 2024-09-02 13:52:12,038 - evalscope - WARNING - Template type: qwen 2024-09-02 13:52:12,215 - modelscope - INFO - PyTorch version 2.2.1 Found. 2024-09-02 13:52:12,216 - modelscope - INFO - Loading ast index from /home/quiana/.cache/modelscope/ast_indexer 2024-09-02 13:52:12,231 - modelscope - INFO - No valid ast index found from /home/quiana/.cache/modelscope/ast_indexer, generating ast index from prebuilt! 2024-09-02 13:52:12,253 - modelscope - INFO - Loading done! Current index file version is 1.10.0, with md5 6d3c8ca6bcbe2eea5343b89ed7d11185 and a total number of 946 components indexed Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2024-09-02 13:52:14,862 - evalscope - WARNING - Got local model dir: /home/quiana/Downloads/Qwen2-1.5B 2024-09-02 13:52:14,862 - evalscope - INFO - Generation config init: {'max_length': 20, 'max_new_tokens': 2048, 'min_length': 0, 'min_new_tokens': None, 'early_stopping': False, 'max_time': None, 'do_sample': False, 'num_beams': 1, 'num_beam_groups': 1, 'penalty_alpha': None, 'use_cache': True, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'epsilon_cutoff': 0.0, 'eta_cutoff': 0.0, 'diversity_penalty': 0.0, 'repetition_penalty': 1.0, 'encoder_repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'bad_words_ids': None, 'force_words_ids': None, 'renormalize_logits': False, 'constraints': None, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'forced_decoder_ids': None, 'sequence_bias': None, 'guidance_scale': None, 'low_memory': None, 'num_return_sequences': 1, 'output_attentions': False, 'output_hidden_states': False, 'output_scores': False, 'output_logits': None, 'return_dict_in_generate': False, 'pad_token_id': 151643, 'bos_token_id': 151643, 'eos_token_id': 151643, 'encoder_no_repeat_ngram_size': 0, 'decoder_start_token_id': None, 'num_assistant_tokens': 5, 'num_assistant_tokens_schedule': 'heuristic', 'cache_implementation': None, 'prompt_lookup_num_tokens': None, 'max_matching_ngram_size': None, 'generation_kwargs': {}, '_from_model_config': False, 'transformers_version': '4.40.2'} 2024-09-02 13:52:14,862 - evalscope - INFO - Evaluating on subsets for general_qa: ['example']

2024-09-02 13:52:14,862 - evalscope - INFO - ** Use default settings:

few_shot_num: None, >few_shot_split: None, >target_eval_split: test 2024-09-02 13:52:14,862 - evalscope - INFO - Start evaluating on dataset /home/quiana/work/evalscope/medical/custom_qa Predicting(default): : 0it [00:00, ?it/s] 2024-09-02 13:52:14,863 - evalscope - ERROR - Got empty predictions on subset default of dataset: /home/quiana/work/evalscope/medical/custom_qa 2024-09-02 13:52:14,863 - evalscope - INFO - Dump data to /home/quiana/.cache/evalscope/outputs/eval_general_qa_01686928d81b34022d22d37943b071b1_default/predictions/_home_quiana_work_evalscope_medical_custom_qa_default.jsonl successfully. Reviewing(default): : 0it [00:00, ?it/s] 2024-09-02 13:52:14,863 - evalscope - INFO - Dump data to /home/quiana/.cache/evalscope/outputs/eval_general_qa_01686928d81b34022d22d37943b071b1_default/reviews/_home_quiana_work_evalscope_medical_custom_qa_default.jsonl successfully. 2024-09-02 13:52:14,863 - evalscope - INFO - Dump report: _home_quiana_work_evalscope_medical_custom_qa.json

2024-09-02 13:52:14,864 - evalscope - INFO - ** Report table: +----------------------------------+--------------+--------------+ | Model | general_qa | general_qa | +==================================+==============+==============+ | 01686928d81b34022d22d37943b071b1 | | | +----------------------------------+--------------+--------------+

2024-09-02 13:52:14,864 - evalscope - INFO - Dump overall task config to /home/quiana/.cache/evalscope/outputs/eval_general_qa_01686928d81b34022d22d37943b071b1_default/configs/task_output_config.yaml 2024-09-02 13:52:14,864 - evalscope - INFO - The overall task config: {'model_args': {'revision': None, 'precision': torch.float16, 'device_map': 'auto'}, 'generation_config': {'temperature': 0.3, 'max_length': 2048, 'max_new_tokens': 512, 'top_k': 50, 'top_p': 0.85, 'do_sample': True, 'num_beams': 1, 'repetition_penalty': 1.0, 'limit': 10}, 'dataset_args': {'general_qa': {'local_path': '/home/quiana/work/evalscope/medical/custom_qa', 'subset_list': ['example']}}, 'dry_run': False, 'model': '/home/quiana/Downloads/Qwen2-1.5B', 'eval_type': 'checkpoint', 'datasets': ['general_qa'], 'outputs': 'outputs', 'use_cache': False, 'stage': 'all', 'dataset_hub': 'Local', 'limit': 10, 'template_type': 'qwen', 'mem_cache': False, 'eval_backend': 'Native'} 2024-09-02 13:52:14,864 - evalscope - INFO - >> Overwrite overall_task_cfg for model_args.precision due to it is not a string 2024-09-02 13:52:14,864 - evalscope - INFO - Dump data to /home/quiana/.cache/evalscope/outputs/eval_general_qa_01686928d81b34022d22d37943b071b1_default/configs/task_output_config.yaml successfully. 2024-09-02 13:52:14,864 - evalscope - INFO - Evaluation finished on /home/quiana/work/evalscope/medical/custom_qa

wangxingjun778 commented 2 months ago

收到，我复现一下

lucheng07082221 commented 2 months ago

@wangxingjun778 好的，麻烦了

Yunnglin commented 2 months ago

请问/home/quiana/work/evalscope/medical/custom_qa路径下有哪些文件? subset_list 需要与数据集名称对应

lucheng07082221 commented 2 months ago

@Yunnglin 跑通了，感谢

modelscope / evalscope

没有结果 #120

Copyright (c) Alibaba, Inc. and its affiliates.

Prepare the config