open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.28k stars 343 forks source link

[Bug] 主观评测指引/Subjective Evaluation not work as expected #586

Closed nijisakai closed 8 months ago

nijisakai commented 8 months ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
 'CUDA_HOME': '/usr/local/cuda',
 'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
 'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A800 80GB PCIe',
 'MMEngine': '0.8.5',
 'NVCC': 'Cuda compilation tools, release 12.2, V12.2.140',
 'OpenCV': '4.8.1',
 'PyTorch': '2.1.0',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2023.1-Product Build 20230303 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.1.1 (Git Hash '
                              '64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX512\n'
                              '  - CUDA Runtime 12.1\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 8.9.2\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wno-psabi '
                              '-Wno-error=pedantic -Wno-error=old-style-cast '
                              '-Wno-invalid-partial-specialization '
                              '-Wno-unused-private-field '
                              '-Wno-aligned-allocation-unavailable '
                              '-Wno-missing-braces -fdiagnostics-color=always '
                              '-faligned-new -Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Werror=cast-function-type '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, '
                              'TORCH_DISABLE_GPU_ASSERTS=ON, '
                              'TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, '
                              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
                              'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
                              'USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, '
                              'USE_OPENMP=ON, USE_ROCM=OFF, \n',
 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]',
 'TorchVision': '0.16.0',
 'numpy_random_seed': 2147483648,
 'opencompass': '0.1.5+d7ff933',
 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

**the first error i stuck is when using the sample code got loop in file missing:

# 导入数据集与主观评测 summarizer
from mmengine.config import read_base
with read_base():
    from .datasets.subjective_cmp.subjective_cmp import subjective_datasets
    from .summarizers.subjective import summarizer

datasets = [*subjective_datasets]

from opencompass.models import HuggingFaceCausalLM, HuggingFace, OpenAI

#导入主观评测所需 partitioner 与 task
from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks.subjective_eval import SubjectiveEvalTask

# 定义推理和评测所需模型配置
# 包括推理模型 chatglm2-6b,qwen-7b-chat,internlm-chat-7b 和 评测模型 gpt4
models = [...]

api_meta_template = dict(
    round=[
        dict(role='HUMAN', api_role='HUMAN'),
        dict(role='BOT', api_role='BOT', generate=True)
    ],
    reserved_roles=[
        dict(role='SYSTEM', api_role='SYSTEM'),
    ],
)

# 定义主观评测配置
eval = dict(
    partitioner=dict(
        type=SubjectiveNaivePartitioner,
        mode='all',  # 新参数,构建比较对时会交替构建两个
    ),
    runner=dict(
        type=LocalRunner,
        max_num_workers=2,  # 支持并行比较
        task=dict(
            type=SubjectiveEvalTask,  # 新 task,用来读入一对 model 的输入
            judge_cfg=dict(
                abbr='GPT4',
                type=OpenAI,
                path='gpt-4-0613',
                key='ENV',
                meta_template=api_meta_template,
                query_per_second=1,
                max_out_len=2048,
                max_seq_len=2048,
                batch_size=2),
        )),
)

if i don't comment out from .summarizers.subjective import summarizer, it will cause the follow error after i build several folders i realize it may be a loop caused by some error.

(opencompass) [chenhy@centos7gpu configs]$ python ../run.py subjective.py -r
[2023-11-14 11:15:46,033] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home2/chenhy/opencompass/configs/../run.py", line 323, in <module>
    main()
  File "/home2/chenhy/opencompass/configs/../run.py", line 196, in main
    cfg = get_config_from_arg(args)
  File "/home2/chenhy/opencompass/opencompass/utils/run.py", line 59, in get_config_from_arg
    return Config.fromfile(args.config, format_python_code=False)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 490, in fromfile
    raise e
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 488, in fromfile
    cfg_dict, imported_names = Config._parse_lazy_import(filename)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 1075, in _parse_lazy_import
    _base_cfg_dict, _base_imported_names = Config._parse_lazy_import(  # noqa: E501
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 1075, in _parse_lazy_import
    _base_cfg_dict, _base_imported_names = Config._parse_lazy_import(  # noqa: E501
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 1068, in _parse_lazy_import
    raise ConfigParsingError(
mmengine.config.utils.ConfigParsingError: summarizers/summarizers/summarizers/subjective.py not found! It means that incorrect module is defined in `with read_base(): = from .summarizers.subjective import ...`, please make sure the base config module is valid and is consistent with the prior import logic

** another question is how to import models. i tried the follows and all failed

from .models.qwen.hf_qwen_7b_chat import hf_qwen_7b_chat
from .models.chatglm.hf_chatglm2_6b import hf_chatglm2_6b
from .models.hf_internlm.hf_internlm_chat_7b import hf_internlm_chat_7b

# 定义推理和评测所需模型配置
# 包括推理模型 chatglm2-6b,qwen-7b-chat,internlm-chat-7b 和 评测模型 gpt4
#models = [hf_qwen_7b_chat, hf_chatglm2_6b, hf_internlm_chat_7b]
models = [...]
#models = [chatglm2-6b,qwen-7b-chat,internlm-chat-7b]

Reproduces the problem - command or script

python ../run.py subjective.py -r

Reproduces the problem - error message

[2023-11-14 11:45:23,315] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home2/chenhy/opencompass/configs/../run.py", line 323, in <module>
    main()
  File "/home2/chenhy/opencompass/configs/../run.py", line 196, in main
    cfg = get_config_from_arg(args)
  File "/home2/chenhy/opencompass/opencompass/utils/run.py", line 59, in get_config_from_arg
    return Config.fromfile(args.config, format_python_code=False)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 456, in fromfile
    lazy_import is None and not Config._is_lazy_import(filename):
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 1657, in _is_lazy_import
    parsed_codes = ast.parse(codes_str)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 16
    from .models.qwen.hf_qwen_7b_chat import qwen-7b-chat
                                                  ^
SyntaxError: invalid decimal literal
^C^[[A^C(opencompass) [chenhy@centos7gpu configs]$ nano subjective.py
(opencompass) [chenhy@centos7gpu configs]$ python ../run.py subjective.py -r
[2023-11-14 11:46:41,188] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
  File "/home2/chenhy/opencompass/configs/../run.py", line 323, in <module>
    main()
  File "/home2/chenhy/opencompass/configs/../run.py", line 196, in main
    cfg = get_config_from_arg(args)
  File "/home2/chenhy/opencompass/opencompass/utils/run.py", line 59, in get_config_from_arg
    return Config.fromfile(args.config, format_python_code=False)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 456, in fromfile
    lazy_import is None and not Config._is_lazy_import(filename):
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/site-packages/mmengine/config/config.py", line 1657, in _is_lazy_import
    parsed_codes = ast.parse(codes_str)
  File "/home2/chenhy/anaconda3/envs/opencompass/lib/python3.10/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 24
    models = [chatglm2-6b,qwen-7b-chat,internlm-chat-7b]
                       ^
SyntaxError: invalid decimal literal
(opencompass) [chenhy@centos7gpu configs]$ nano subjective.py
(opencompass) [chenhy@centos7gpu configs]$ python ../run.py subjective.py -r
[2023-11-14 11:47:28,706] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
11/14 11:47:30 - OpenCompass - INFO - Reusing experiements from 20231114_111855
11/14 11:47:30 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
Traceback (most recent call last):
  File "/home2/chenhy/opencompass/configs/../run.py", line 323, in <module>
    main()
  File "/home2/chenhy/opencompass/configs/../run.py", line 272, in main
    tasks = partitioner(cfg)
  File "/home2/chenhy/opencompass/opencompass/partitioners/base.py", line 68, in __call__
    tasks = self.partition(models,
  File "/home2/chenhy/opencompass/opencompass/partitioners/size.py", line 90, in partition
    filename = get_infer_output_path(model, dataset, out_dir)
  File "/home2/chenhy/opencompass/opencompass/utils/abbr.py", line 44, in get_infer_output_path
    model_abbr = model_abbr_from_cfg(model_cfg)
  File "/home2/chenhy/opencompass/opencompass/utils/abbr.py", line 9, in model_abbr_from_cfg
    if 'abbr' in cfg:
TypeError: argument of type 'ellipsis' is not iterable

Other information

No response

tonysy commented 8 months ago

589

tonysy commented 8 months ago

Please try the latest commit. Feel free to re-open if needed.

frankweijue commented 8 months ago

For the first question:

There were some bugs in the code, and they have been fixed in #589. Please use python run.py configs/subjective.py to ensure the correct import of the summarizer.

For the second question:

Importing the model configuration file should be done within the read_base() function. You can refer to the updated configs/subjective.py for more details.

Thank you for your feedback!