open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.73k stars 400 forks source link

[Bug] 使用VLLM时遇到被切分的task会卡住,而HuggingFaceCausalLM则不会 #1018

Closed IcyFeather233 closed 4 months ago

IcyFeather233 commented 5 months ago

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
Python 3.10.14
OpenCompass 02e7eec91131cec2050a0bfee3eb07f37298491f Mar 28, 2024

重现问题 - 代码/配置示例

Model Configs

without VLLM

from opencompass.models import HuggingFaceCausalLM

models = [
    dict(
        type=HuggingFaceCausalLM,
        abbr='qwen',
        path="localpath",
        tokenizer_path='/localpath',
        model_kwargs=dict(
            device_map='auto',
            trust_remote_code=True,
        ),
        tokenizer_kwargs=dict(
            padding_side='left',
            truncation_side='left',
            trust_remote_code=True,
#            use_fast=False,
        ),
        pad_token_id=151643,
        min_out_len=1,
        max_out_len=100,
        max_seq_len=2048,
        batch_size=8,
        run_cfg=dict(num_gpus=2, num_procs=1),
    )
]

with VLLM

from opencompass.models import VLLM

models = [
    dict(
        type=VLLM,
        abbr='vllm-qwen',
        path="localpath",
        model_kwargs=dict(tensor_parallel_size=2),
        min_out_len=1,
        max_out_len=100,
        max_seq_len=2048,
        batch_size=8,
        run_cfg=dict(num_gpus=2, num_procs=1),
    )
]

Eval Configs

from mmengine.config import read_base

with read_base():
    from .datasets.mmlu.mmlu_gen import mmlu_datasets
    from .models.qwen.vllm_qwen import models

datasets = [*mmlu_datasets]
models = [*models]

重现问题 - 命令或脚本

python run.py configs/eval_vllm_qwen.py -w outputs/vllm_qwen --debug

重现问题 - 错误信息

04/02 05:53:17 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
04/02 05:53:17 - OpenCompass - DEBUG - Modules of opencompass's partitioner registry have been automatically imported from opencompass.partitioners
04/02 05:53:17 - OpenCompass - DEBUG - Get class `SizePartitioner` from "partitioner" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `SizePartitioner` instance is built from registry, and its implementation can be found in opencompass.partitioners.size
04/02 05:53:17 - OpenCompass - DEBUG - Key eval.runner.task.judge_cfg not found in config, ignored.
04/02 05:53:17 - OpenCompass - DEBUG - Key eval.runner.task.dump_details not found in config, ignored.
04/02 05:53:17 - OpenCompass - DEBUG - Key eval.given_pred not found in config, ignored.
04/02 05:53:17 - OpenCompass - DEBUG - Additional config: {}
04/02 05:53:17 - OpenCompass - DEBUG - Modules of opencompass's load_dataset registry have been automatically imported from opencompass.datasets
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:17 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:17 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:53:18 - OpenCompass - INFO - Partitioned into 15 tasks.
04/02 05:53:18 - OpenCompass - DEBUG - Task 0: [vllm-skywork/lukaemon_mmlu_professional_law_0]
04/02 05:53:18 - OpenCompass - DEBUG - Task 1: [vllm-skywork/lukaemon_mmlu_professional_law_1]
04/02 05:53:18 - OpenCompass - DEBUG - Task 2: [vllm-skywork/lukaemon_mmlu_professional_law_2]
04/02 05:53:18 - OpenCompass - DEBUG - Task 3: [vllm-skywork/lukaemon_mmlu_professional_law_3]
04/02 05:53:18 - OpenCompass - DEBUG - Task 4: [vllm-skywork/lukaemon_mmlu_professional_law_4]
04/02 05:53:18 - OpenCompass - DEBUG - Task 5: [vllm-skywork/lukaemon_mmlu_professional_law_5]
04/02 05:53:18 - OpenCompass - DEBUG - Task 6: [vllm-skywork/lukaemon_mmlu_professional_law_6]
04/02 05:53:18 - OpenCompass - DEBUG - Task 7: [vllm-skywork/lukaemon_mmlu_professional_law_7]
04/02 05:53:18 - OpenCompass - DEBUG - Task 8: [vllm-skywork/lukaemon_mmlu_moral_scenarios,vllm-skywork/lukaemon_mmlu_miscellaneous]
04/02 05:53:18 - OpenCompass - DEBUG - Task 9: [vllm-skywork/lukaemon_mmlu_professional_psychology,vllm-skywork/lukaemon_mmlu_high_school_psychology,vllm-skywork/lukaemon_mmlu_high_school_macroeconomics,vllm-skywork/lukaemon_mmlu_elementary_mathematics]
04/02 05:53:18 - OpenCompass - DEBUG - Task 10: [vllm-skywork/lukaemon_mmlu_moral_disputes,vllm-skywork/lukaemon_mmlu_prehistory,vllm-skywork/lukaemon_mmlu_philosophy,vllm-skywork/lukaemon_mmlu_high_school_biology,vllm-skywork/lukaemon_mmlu_nutrition,vllm-skywork/lukaemon_mmlu_professional_accounting]
04/02 05:53:18 - OpenCompass - DEBUG - Task 11: [vllm-skywork/lukaemon_mmlu_professional_medicine,vllm-skywork/lukaemon_mmlu_high_school_mathematics,vllm-skywork/lukaemon_mmlu_clinical_knowledge,vllm-skywork/lukaemon_mmlu_security_studies,vllm-skywork/lukaemon_mmlu_high_school_microeconomics,vllm-skywork/lukaemon_mmlu_high_school_world_history,vllm-skywork/lukaemon_mmlu_conceptual_physics,vllm-skywork/lukaemon_mmlu_marketing]
04/02 05:53:18 - OpenCompass - DEBUG - Task 12: [vllm-skywork/lukaemon_mmlu_human_aging,vllm-skywork/lukaemon_mmlu_high_school_statistics,vllm-skywork/lukaemon_mmlu_high_school_us_history,vllm-skywork/lukaemon_mmlu_high_school_chemistry,vllm-skywork/lukaemon_mmlu_sociology,vllm-skywork/lukaemon_mmlu_high_school_geography,vllm-skywork/lukaemon_mmlu_high_school_government_and_politics,vllm-skywork/lukaemon_mmlu_college_medicine,vllm-skywork/lukaemon_mmlu_world_religions,vllm-skywork/lukaemon_mmlu_virology]
04/02 05:53:18 - OpenCompass - DEBUG - Task 13: [vllm-skywork/lukaemon_mmlu_high_school_european_history,vllm-skywork/lukaemon_mmlu_logical_fallacies,vllm-skywork/lukaemon_mmlu_astronomy,vllm-skywork/lukaemon_mmlu_high_school_physics,vllm-skywork/lukaemon_mmlu_electrical_engineering,vllm-skywork/lukaemon_mmlu_college_biology,vllm-skywork/lukaemon_mmlu_anatomy,vllm-skywork/lukaemon_mmlu_human_sexuality,vllm-skywork/lukaemon_mmlu_formal_logic,vllm-skywork/lukaemon_mmlu_international_law,vllm-skywork/lukaemon_mmlu_econometrics,vllm-skywork/lukaemon_mmlu_machine_learning,vllm-skywork/lukaemon_mmlu_public_relations,vllm-skywork/lukaemon_mmlu_jurisprudence,vllm-skywork/lukaemon_mmlu_management]
04/02 05:53:18 - OpenCompass - DEBUG - Task 14: [vllm-skywork/lukaemon_mmlu_college_physics,vllm-skywork/lukaemon_mmlu_college_chemistry,vllm-skywork/lukaemon_mmlu_college_computer_science,vllm-skywork/lukaemon_mmlu_college_mathematics,vllm-skywork/lukaemon_mmlu_abstract_algebra,vllm-skywork/lukaemon_mmlu_global_facts,vllm-skywork/lukaemon_mmlu_computer_security,vllm-skywork/lukaemon_mmlu_medical_genetics,vllm-skywork/lukaemon_mmlu_high_school_computer_science,vllm-skywork/lukaemon_mmlu_business_ethics,vllm-skywork/lukaemon_mmlu_us_foreign_policy]
04/02 05:53:18 - OpenCompass - DEBUG - Modules of opencompass's runner registry have been automatically imported from opencompass.runners
04/02 05:53:18 - OpenCompass - DEBUG - Get class `LocalRunner` from "runner" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `LocalRunner` instance is built from registry, and its implementation can be found in opencompass.runners.local
04/02 05:53:18 - OpenCompass - DEBUG - Modules of opencompass's task registry have been automatically imported from opencompass.tasks
04/02 05:53:18 - OpenCompass - DEBUG - Get class `OpenICLInferTask` from "task" registry in "opencompass"
04/02 05:53:18 - OpenCompass - DEBUG - An `OpenICLInferTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
04/02 05:53:19 - OpenCompass - INFO - Task [vllm-skywork/lukaemon_mmlu_professional_law_0]
04/02 05:53:19 - OpenCompass - DEBUG - Modules of opencompass's model registry have been automatically imported from opencompass.models
04/02 05:53:19 - OpenCompass - DEBUG - Get class `VLLM` from "model" registry in "opencompass"
2024-04-02 05:53:23,773 WARNING utils.py:580 -- Detecting docker specified CPUs. In previous versions of Ray, CPU detection in containers was incorrect. Please ensure that Ray has enough CPUs allocated. As a temporary workaround to revert to the prior behavior, set `RAY_USE_MULTIPROCESSING_CPU_COUNT=1` as an env var before starting Ray. Set the env var: `RAY_DISABLE_DOCKER_CPU_WARNING=1` to mute this warning.
2024-04-02 05:53:23,876 INFO worker.py:1752 -- Started a local Ray instance.
INFO 04-02 05:53:30 llm_engine.py:75] Initializing an LLM engine (v0.4.0) with config: model='/maindata/data/shared/public/dehao.li/wxb_online_model/merge_base_safe_lora_qwen14b_chat240323', tokenizer='/maindata/data/shared/public/dehao.li/wxb_online_model/merge_base_safe_lora_qwen14b_chat240323', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO 04-02 05:53:54 selector.py:45] Cannot use FlashAttention because the package is not found. Please install it for better performance.
INFO 04-02 05:53:54 selector.py:21] Using XFormers backend.
(RayWorkerVllm pid=42130) INFO 04-02 05:53:54 selector.py:45] Cannot use FlashAttention because the package is not found. Please install it for better performance.
(RayWorkerVllm pid=42130) INFO 04-02 05:53:54 selector.py:21] Using XFormers backend.
(RayWorkerVllm pid=42130) INFO 04-02 05:53:57 pynccl_utils.py:45] vLLM is using nccl==2.18.1
INFO 04-02 05:53:57 pynccl_utils.py:45] vLLM is using nccl==2.18.1
(RayWorkerVllm pid=42130) INFO 04-02 05:54:39 model_runner.py:104] Loading model weights took 13.2904 GB
INFO 04-02 05:54:39 model_runner.py:104] Loading model weights took 13.2904 GB
INFO 04-02 05:54:46 ray_gpu_executor.py:240] # GPU blocks: 8817, # CPU blocks: 655
INFO 04-02 05:54:48 model_runner.py:791] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 04-02 05:54:48 model_runner.py:795] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
(RayWorkerVllm pid=42130) INFO 04-02 05:54:48 model_runner.py:791] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
(RayWorkerVllm pid=42130) INFO 04-02 05:54:48 model_runner.py:795] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 04-02 05:54:55 model_runner.py:867] Graph capturing finished in 7 secs.
(RayWorkerVllm pid=42130) INFO 04-02 05:54:55 model_runner.py:867] Graph capturing finished in 7 secs.
04/02 05:54:55 - OpenCompass - DEBUG - An `VLLM` instance is built from registry, and its implementation can be found in opencompass.models.vllm
04/02 05:54:55 - OpenCompass - DEBUG - Get class `MMLUDataset` from "load_dataset" registry in "opencompass"
04/02 05:54:55 - OpenCompass - DEBUG - An `MMLUDataset` instance is built from registry, and its implementation can be found in opencompass.datasets.mmlu
04/02 05:54:55 - OpenCompass - INFO - Start inferencing [vllm-skywork/lukaemon_mmlu_professional_law_0]
04/02 05:54:55 - OpenCompass - DEBUG - Modules of opencompass's icl_prompt_templates registry have been automatically imported from opencompass.openicl.icl_prompt_template
04/02 05:54:55 - OpenCompass - DEBUG - Get class `PromptTemplate` from "icl_prompt_templates" registry in "opencompass"
04/02 05:54:55 - OpenCompass - DEBUG - An `PromptTemplate` instance is built from registry, and its implementation can be found in opencompass.openicl.icl_prompt_template
04/02 05:54:55 - OpenCompass - DEBUG - Get class `PromptTemplate` from "icl_prompt_templates" registry in "opencompass"
04/02 05:54:55 - OpenCompass - DEBUG - An `PromptTemplate` instance is built from registry, and its implementation can be found in opencompass.openicl.icl_prompt_template
04/02 05:54:55 - OpenCompass - DEBUG - Modules of opencompass's icl_retrievers registry have been automatically imported from opencompass.openicl.icl_retriever
04/02 05:54:55 - OpenCompass - DEBUG - Get class `FixKRetriever` from "icl_retrievers" registry in "opencompass"
04/02 05:54:55 - OpenCompass - DEBUG - An `FixKRetriever` instance is built from registry, and its implementation can be found in opencompass.openicl.icl_retriever.icl_fix_k_retriever
04/02 05:54:55 - OpenCompass - DEBUG - Modules of opencompass's icl_inferencers registry have been automatically imported from opencompass.openicl.icl_inferencer
04/02 05:54:55 - OpenCompass - DEBUG - Get class `GenInferencer` from "icl_inferencers" registry in "opencompass"
04/02 05:54:55 - OpenCompass - DEBUG - An `GenInferencer` instance is built from registry, and its implementation can be found in opencompass.openicl.icl_inferencer.icl_gen_inferencer
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 192/192 [00:00<00:00, 2739137.31it/s]
[2024-04-02 05:54:56,458] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.41it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.34it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.36it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.35it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.67it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.39it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.45it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.43it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.48it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.34it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.72it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:03<00:00,  4.66it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:43<00:00,  3.65s/it]
04/02 05:55:40 - OpenCompass - DEBUG - Get class `OpenICLInferTask` from "task" registry in "opencompass"
04/02 05:55:40 - OpenCompass - DEBUG - An `OpenICLInferTask` instance is built from registry, and its implementation can be found in opencompass.tasks.openicl_infer
04/02 05:55:41 - OpenCompass - INFO - Task [vllm-skywork/lukaemon_mmlu_professional_law_1]
04/02 05:55:41 - OpenCompass - DEBUG - Get class `VLLM` from "model" registry in "opencompass"
2024-04-02 05:55:41,632 INFO worker.py:1585 -- Calling ray.init() again after it has already been called.

到这里就卡住不动了,如果我换成不用VLLM的model,则会没有卡顿的进入被切分的第二个task

其他信息

No response

IcyFeather233 commented 5 months ago

目前可以通过运行时把 --max-partition-size 指定的很大避免对数据集进行切分从而规避这个问题,但是只是一个暂时的解决方案,还是希望开发人员可以看看怎么解决~

ww0o0 commented 5 months ago

我也遇到了这个问题,使用vllm对第一个task测完之后就会卡住了

ww0o0 commented 5 months ago

目前可以通过运行时把 指定的很大避免对数据集进行切分从而规避这个问题,但是只是一个暂时的解决方案,还是希望开发人员可以看看怎么解决~--max-partition-size

--max-partition-size 对单个数据集可以解决,但是多个数据集进行测评的话还是会分为多个task也会出现这个问题

Zbaoli commented 5 months ago

same question, get "Calling ray.init() again after it has already been called." error

Zbaoli commented 5 months ago

in opencompass/models/vllm.py

import ray
if ray.is_initialized():
    self.logger.info('shutdown ray instance to avoid "Calling ray.init() again" error.')
     ray.shutdown()

add above command before calling vllm LLM class; in about 52 lines;

IcyFeather233 commented 5 months ago

in opencompass/models/vllm.py

import ray
if ray.is_initialized():
    self.logger.info('shutdown ray instance to avoid "Calling ray.init() again" error.')
     ray.shutdown()

add above command before calling vllm LLM class; in about 52 lines;

发现使用了这个方法之后,对于单模型多数据集的情况,每次有新数据集,似乎模型也要跟着重新启动一遍ray,即每处理一个数据集都会输出:

2024-04-12 01:59:04,123 INFO worker.py:1743 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8266 
INFO 04-12 01:59:44 llm_engine.py:75] Initializing an LLM engine (v0.4.0) with config: model='xxx', tokenizer='xxx)

(RayWorkerVllm pid=108846) INFO 04-12 02:01:37 selector.py:16] Using FlashAttention backend.

然而我发现这个过程十分耗时,有没有办法能改成启动一遍ray,一口气把数据集都跑完?