modelscope / evalscope

A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
https://evalscope.readthedocs.io/en/latest/
Apache License 2.0
242 stars 31 forks source link

module 'evaluate' has no attribute 'load' #188

Open charliedream1 opened 4 days ago

charliedream1 commented 4 days ago

问题描述 / Issue Description

请简要描述您遇到的问题。 / Please briefly describe the issue you encountered.

测试 winogrande报错,mmlu正常

Traceback (most recent call last):
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 462, in <module>
    inferencer.run()
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 114, in run
    self._score()
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 250, in _score
    result = icl_evaluator.score(**preds)
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/openicl/icl_evaluator/icl_hf_evaluator.py", line 83, in score
    metric = evaluate.load(local_path)
AttributeError: module 'evaluate' has no attribute 'load'

使用的工具 / Tools Used

执行的代码或指令 / Code or Commands Executed

请提供您执行的主要代码或指令。 / Please provide the main code or commands you executed. 例如 / For example:

# Copyright (c) Alibaba, Inc. and its affiliates.

"""
1. Installation
EvalScope: pip install evalscope[opencompass]

2. Download dataset to data/ folder
wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip
unzip OpenCompassData-core-20240207.zip

3. Deploy model serving
    swift deploy --model_type qwen2-1_5b-instruct

4. Run eval task
"""
from evalscope.backend.opencompass import OpenCompassBackendManager
from evalscope.run import run_task
from evalscope.summarizer import Summarizer

def run_swift_eval():

    # List all datasets
    # e.g.  ['mmlu', 'WSC', 'DRCD', 'chid', 'gsm8k', 'AX_g', 'BoolQ', 'cmnli', 'ARC_e', 'ocnli_fc', 'summedits', 'MultiRC', 'GaokaoBench', 'obqa', 'math', 'agieval', 'hellaswag', 'RTE', 'race', 'ocnli', 'strategyqa', 'triviaqa', 'WiC', 'COPA', 'piqa', 'nq', 'mbpp', 'csl', 'Xsum', 'CB', 'tnews', 'ARC_c', 'afqmc', 'eprstmt', 'ReCoRD', 'bbh', 'CMRC', 'AX_b', 'siqa', 'storycloze', 'humaneval', 'cluewsc', 'winogrande', 'lambada', 'ceval', 'bustm', 'C3', 'lcsts']
    print(
        f"** All datasets from OpenCompass backend: {OpenCompassBackendManager.list_datasets()}"
    )

    # Prepare the config
    """
    Attributes:
        `eval_backend`: Default to 'OpenCompass'
        `datasets`: list, refer to `OpenCompassBackendManager.list_datasets()`
        `models`: list of dict, each dict must contain `path` and `openai_api_base` 
                `path`: reuse the value of '--model_type' in the command line `swift deploy`
                `openai_api_base`: the base URL of swift model serving
        `work_dir`: str, the directory to save the evaluation results、logs and summaries. Default to 'outputs/default'

        Refer to `opencompass.cli.arguments.ApiModelConfig` for other optional attributes.
    """
    # Option 1: Use dict format
    # Args:
    #   path: The path of the model, it means the `model_type` for swift, e.g. 'llama3-8b-instruct'
    #   is_chat: True for chat model, False for base model
    #   key: The OpenAI api-key of the model api, default to 'EMPTY'
    #   openai_api_base: The base URL of the OpenAI API, it means the swift model serving URL.
    task_cfg = dict(
        eval_backend="OpenCompass",
        eval_config={
            "datasets": ["winogrande"],
            "models": [
                {
                    "path": "qwen2-7b-instruct",  # Please make sure the model is deployed
                    "openai_api_base": "http://127.0.0.1:8000/v1/chat/completions",
                    "is_chat": True,
                    "batch_size": 16,
                },
            ],
            "work_dir": "outputs/qwen2_eval_result",
            "limit": 10,
        },
    )

    # Option 2: Use yaml file
    # task_cfg = 'examples/tasks/default_eval_swift_openai_api.yaml'

    # Option 3: Use json file
    # task_cfg = 'examples/tasks/default_eval_swift_openai_api.json'

    # Run task
    run_task(task_cfg=task_cfg)

    # [Optional] Get the final report with summarizer
    print(">> Start to get the report with summarizer ...")
    report_list = Summarizer.get_report_from_cfg(task_cfg)
    print(f"\n>>The report list: {report_list}")

if __name__ == "__main__":
    run_swift_eval()

错误日志 / Error Log

请粘贴完整的错误日志或控制台输出。 / Please paste the full error log or console output. 例如 / For example:

Traceback (most recent call last):
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 462, in <module>
    inferencer.run()
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 114, in run
    self._score()
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/tasks/openicl_eval.py", line 250, in _score
    result = icl_evaluator.score(**preds)
  File "/home/miniconda3/envs/train_py310/lib/python3.10/site-packages/opencompass/openicl/icl_evaluator/icl_hf_evaluator.py", line 83, in score
    metric = evaluate.load(local_path)
AttributeError: module 'evaluate' has no attribute 'load'

运行环境 / Runtime Environment

其他信息 / Additional Information

如果有其他相关信息,请在此处提供。 / If there is any other relevant information, please provide it here.

Yunnglin commented 2 days ago

这是一个已知问题,请参考 #148 ,暂时还未修复

charliedream1 commented 2 days ago

请尽快修复

---Original--- From: "Yunlin @.> Date: Mon, Nov 11, 2024 14:36 PM To: @.>; Cc: "Optimus @.**@.>; Subject: Re: [modelscope/evalscope] module 'evaluate' has no attribute 'load'(Issue #188)

这是一个已知问题,请参考 #148 ,暂时还未修复

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

wangxingjun778 commented 1 day ago

请参考: #148 中的回复,尝试升级到最新版本