open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
3.73k stars 400 forks source link

[Bug] KeyError: 'OpenICLInferTask is already registered in task at opencompass.tasks.openicl_infer' #1327

Closed noforit closed 1 month ago

noforit commented 1 month ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': False,
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0',
'MMEngine': '0.10.4',
'MUSA available': False,
'OpenCV': '4.10.0',
'PyTorch': '2.3.0+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.3.6 (Git Hash '
'86e6af5974177e513fd3fee58425e1063e7f1361)\n' ' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n' ' - LAPACK is enabled (usually provided by ' 'MKL)\n' ' - NNPACK is enabled\n' ' - CPU capability usage: AVX512\n' ' - Build settings: BLAS_INFO=mkl, ' 'BUILD_TYPE=Release, CUDA_VERSION=12.1, ' 'CUDNN_VERSION=8.9.2, ' 'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, ' 'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 ' '-fabi-version=11 -fvisibility-inlines-hidden ' '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO ' '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM ' '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK ' '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE ' '-O2 -fPIC -Wall -Wextra -Werror=return-type ' '-Werror=non-virtual-dtor -Werror=bool-operation ' '-Wnarrowing -Wno-missing-field-initializers ' '-Wno-type-limits -Wno-array-bounds ' '-Wno-unknown-pragmas -Wno-unused-parameter ' '-Wno-unused-function -Wno-unused-result ' '-Wno-strict-overflow -Wno-strict-aliasing ' '-Wno-stringop-overflow -Wsuggest-override ' '-Wno-psabi -Wno-error=pedantic ' '-Wno-error=old-style-cast -Wno-missing-braces ' '-fdiagnostics-color=always -faligned-new ' '-Wno-unused-but-set-variable ' '-Wno-maybe-uninitialized -fno-math-errno ' '-fno-trapping-math -Werror=format ' '-Wno-stringop-overflow, LAPACK_INFO=mkl, ' 'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, ' 'PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, ' 'USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, ' 'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, ' 'USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, ' 'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, ' 'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, ' 'USE_ROCM_KERNEL_ASSERT=OFF, \n', 'Python': '3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]', 'TorchVision': '0.18.0+cu121', 'numpy_random_seed': 2147483648, 'opencompass': '0.2.5+a77b8a5', 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

eval_base.py

`from mmengine.config import read_base

with read_base(): from .models.openbmb.vllm_minicpm_2b_base import models from .datasets.collections.leaderboard.base import datasets from .summarizers.leaderboard import summarizer`

vllm_minicpm_2b_base.py

`from opencompass.models import HuggingFace, VLLM

models = [ dict( type=VLLM, abbr='minicpm-2b-ckpt260000', path='/home/jishiyu/hf_model/openbmb/MiniCPM-2B-history/ckpt260000', model_kwargs=dict(tensor_parallel_size=4),

meta_template=_meta_template,

    max_out_len=100,
    max_seq_len=2048,
    batch_size=32,
    generation_kwargs=dict(temperature=0),
    run_cfg=dict(num_gpus=4, num_procs=1),
    # end_str='<用户>',
)

]`

Reproduces the problem - command or script

`# slurm config

. "$HOME"/miniconda3/etc/profile.d/conda.sh conda activate opencompass25

config=eval_base.py ray stop ray start CUDA_VISIBLE_DEVICES=0,1,2,3 python run.py configs/$config `

Reproduces the problem - error message

2024-07-17 00:22:44,715 INFO worker.py:1771 -- Started a local Ray instance. INFO 07-17 00:23:25 config.py:623] Defaulting to use mp for distributed inference INFO 07-17 00:23:25 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='/home/jishiyu/hf_model/openbmb/MiniCPM-2B-history/ckpt260000', speculative_config=None, tokenizer='/home/jishiyu/hf_model/openbmb/MiniCPM-2B-history/ckpt260000', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=/home/jishiyu/hf_model/openbmb/MiniCPM-2B-history/ckpt260000) Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jishiyu/code/opencompass25/opencompass/tasks/openicl_infer.py", line 21, in <module> class OpenICLInferTask(BaseTask): File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/site-packages/mmengine/registry/registry.py", line 666, in _register self._register_module(module=module, module_name=name, force=force) File "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/site-packages/mmengine/registry/registry.py", line 611, in _register_module raise KeyError(f'{name} is already registered in {self.name} ' KeyError: 'OpenICLInferTask is already registered in task at opencompass.tasks.openicl_infer'

Other information

No response

Mor-Li commented 1 month ago

I encountered this issue previously, and a possible solution that worked for me is to check the file "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/site-packages/mmengine/registry/registry.py" and modify it to pass the KeyError instead of raising it. This approach can prevent the error from occurring when OpenICLInferTask is already registered.

try:
    self._register_module(module=module, module_name=name, force=force)
except KeyError:
    pass

While the root cause of the error may require a more in-depth investigation, this simple modification could resolve your problem by preventing duplicate registrations.

noforit commented 1 month ago

I encountered this issue previously, and a possible solution that worked for me is to check the file "/home/jishiyu/miniconda3/envs/opencompass25/lib/python3.10/site-packages/mmengine/registry/registry.py" and modify it to pass the KeyError instead of raising it. This approach can prevent the error from occurring when OpenICLInferTask is already registered.

try:
    self._register_module(module=module, module_name=name, force=force)
except KeyError:
    pass

While the root cause of the error may require a more in-depth investigation, this simple modification could resolve your problem by preventing duplicate registrations.

thanks a lot, it works!

Mor-Li commented 1 month ago

No problem! By the way, I also suggest you try to git pull the latest code. I think the bug you mentioned might have been fixed in this PR: #1311

nanxue2023 commented 1 month ago

No problem! By the way, I also suggest you try to git pull the latest code. I think the bug you mentioned might have been fixed in this PR: #1311

This doesn't work for me if I don't replace the code try: self._register_module(module=module, module_name=name, force=force) except KeyError: pass