open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.21k stars 449 forks source link

[Bug] 评测 Hellaswag 失败 #1166

Open SefaZeng opened 6 months ago

SefaZeng commented 6 months ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
 'CUDA_HOME': '/usr/local/cuda',
 'GCC': 'gcc (GCC) 8.3.0',
 'GPU 0,1,2,3,4,5,6,7': 'Tesla V100-SXM2-32GB',
 'MMEngine': '0.9.0',
 'NVCC': 'Cuda compilation tools, release 11.3, V11.3.109',
 'OpenCV': '4.6.0',
 'PyTorch': '1.11.0',
 'PyTorch compiling details': 'PyTorch built with:
'
                              '  - GCC 8.3
'
                              '  - C++ Version: 201402
'
                              '  - Intel(R) MKL-DNN v2.5.2 (Git Hash '
                              'a9302535553c73243c632ad3c4c80beec3d19a1e)
'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)
'
                              '  - NNPACK is enabled
'
                              '  - CPU capability usage: AVX2
'
                              '  - CUDA Runtime 11.3
'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
'
                              '  - CuDNN 8.2
'
                              '  - Build settings: BUILD_TYPE=Release, '
                              'CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, '
                              'CXX_COMPILER=/usr/local/gcc/bin/g++, CXX_FLAGS= '
                              '-Wno-deprecated -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -fopenmp -DNDEBUG '
                              '-DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK '
                              '-DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK '
                              '-DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-DEDGE_PROFILER_USE_KINETO -O2 -fPIC '
                              '-Wno-narrowing -Wall -Wextra '
                              '-Werror=return-type '
                              '-Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-sign-compare '
                              '-Wno-unused-parameter -Wno-unused-function '
                              '-Wno-unused-result -Wno-unused-local-typedefs '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-error=deprecated-declarations '
                              '-Wno-stringop-overflow -Wno-psabi '
                              '-Wno-error=pedantic -Wno-error=redundant-decls '
                              '-Wno-error=old-style-cast '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Werror=cast-function-type '
                              '-Wno-stringop-overflow, '
                              'FORCE_FALLBACK_CUDA_MPI=1, PERF_WITH_AVX=1, '
                              'PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, '
                              'TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=ON, '
                              'USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, '
                              'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, '
                              'USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=1, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 
',
 'Python': '3.8.12 (default, Jun 13 2022, 19:37:57) [GCC 8.3.0]',
 'TorchVision': '0.12.0',
 'numpy_random_seed': 2147483648,
 'opencompass': '0.2.4+aa2dd2b',
 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

    python3 ${OPENCOMPASS_PATH}/run.py --work-dir ${OPENCOMPASS_PATH} \
        --datasets hellaswag_ppl \                                                                                                                                                                                                                                                                 
        --hf-path ${model_path} \
        --config-dir ${OPENCOMPASS_PATH}/configs \
        --model-kwargs device_map='auto' \
        --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True add_special_tokens=False \
        --max-seq-len 4096 \
        --max-out-len 100 \
        --batch-size 2  \
        --reuse results/${model_name} \
        --num-gpus 8

Reproduces the problem - command or script

    python3 ${OPENCOMPASS_PATH}/run.py --work-dir ${OPENCOMPASS_PATH} \
        --datasets hellaswag_ppl \                                                                                                                                                                                                                                                                 
        --hf-path ${model_path} \
        --config-dir ${OPENCOMPASS_PATH}/configs \
        --model-kwargs device_map='auto' \
        --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True add_special_tokens=False \
        --max-seq-len 4096 \
        --max-out-len 100 \
        --batch-size 2  \
        --reuse results/${model_name} \
        --num-gpus 8

Reproduces the problem - error message

05/14 19:39:24 - OpenCompass - ^[[4m^[[37mINFO^[[0m - Task [global_step320000_hf_hf/hellaswag]                                                                                                                                                                                                     
/usr/local/python/lib/python3.8/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[2024-05-14 19:39:24,759] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
05/14 19:39:29 - OpenCompass - ^[[5m^[[4m^[[33mWARNING^[[0m - pad_token_id is not set for the tokenizer.
05/14 19:39:29 - OpenCompass - ^[[5m^[[4m^[[33mWARNING^[[0m - Using eos_token_id 2 as pad_token_id.
05/14 19:39:52 - OpenCompass - ^[[4m^[[37mINFO^[[0m - Start inferencing [global_step320000_hf_hf/hellaswag]

No chat template is defined for this tokenizer - using a default chat template that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

Traceback (most recent call last):
  File "/workspace/opencompass-20240514/opencompass/tasks/openicl_infer.py", line 162, in <module>
    inferencer.run()
  File "/workspace/opencompass-20240514/opencompass/tasks/openicl_infer.py", line 90, in run 
    self._inference()
  File "/workspace/opencompass-20240514/opencompass/tasks/openicl_infer.py", line 135, in _inference
    inferencer.inference(retriever,
  File "/workspace/opencompass-20240514/opencompass/openicl/icl_inferencer/icl_ppl_inferencer.py", line 113, in inference
    prompt_token_num = self.model.get_token_len_from_template(prompt, mode='ppl')
  File "/workspace/opencompass-20240514/opencompass/models/base.py", line 189, in get_token_len_from_template
    token_lens = [self.get_token_len(prompt) for prompt in prompts]
  File "/workspace/opencompass-20240514/opencompass/models/base.py", line 189, in <listcomp>
    token_lens = [self.get_token_len(prompt) for prompt in prompts]
  File "/workspace/opencompass-20240514/opencompass/models/huggingface_above_v4_33.py", line 266, in get_token_len
    t = self.tokenizer.apply_chat_template(m, add_generation_prompt=True, return_dict=True)
  File "/usr/local/python/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1745, in apply_chat_template
    rendered = compiled_template.render(
  File "/usr/local/python/lib/python3.8/site-packages/jinja2/environment.py", line 1301, in render
    self.environment.handle_exception()
  File "/usr/local/python/lib/python3.8/site-packages/jinja2/environment.py", line 936, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 2, in top-level template code
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'content'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 230) of binary: /usr/local/python/bin/python3.8
Traceback (most recent call last):
  File "/usr/local/python/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/usr/local/python/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/python/lib/python3.8/site-packages/torch/distributed/run.py", line 724, in main
    run(args)
  File "/usr/local/python/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run 
    elastic_launch(
  File "/usr/local/python/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/python/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
/workspace/opencompass-20240514/opencompass/tasks/openicl_infer.py FAILED

Other information

No response

JacquelineXu commented 6 months ago

遇到了同样的问题...脚本如下

python run.py --datasets ceval_gen --hf-path /data/ptm/internlm2-chat-1_8b --tokenizer-path /data/ptm/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug


然后仔细研究报错发现huggingface_above_v4_33.py这个部分很神奇 打印了下opencompass/opencompass/tasks/openicl_infer.py里面73行的model_cfg,发现里面的type是opencompass.models.huggingface_above_v4_33.HuggingFacewithChatTemplate,不太理解为啥不是我指定的internlm2-chat-1_8b对应的模型


再仔细研究发现,错误在一个看起来不太重要的get_token_len函数,于是用它旁边的opencompass/opencompass/models/huggingface.py里面的get_token_len实现替换了,然后就跑通了...


不一定是正确解法,但能work 有官方解答或者更可信的方案的可以继续来回答下

SefaZeng commented 6 months ago

遇到了同样的问题...脚本如下

python run.py --datasets ceval_gen --hf-path /data/ptm/internlm2-chat-1_8b --tokenizer-path /data/ptm/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug

然后仔细研究报错发现huggingface_above_v4_33.py这个部分很神奇 打印了下opencompass/opencompass/tasks/openicl_infer.py里面73行的model_cfg,发现里面的type是opencompass.models.huggingface_above_v4_33.HuggingFacewithChatTemplate,不太理解为啥不是我指定的internlm2-chat-1_8b对应的模型

再仔细研究发现,错误在一个看起来不太重要的get_token_len函数,于是用它旁边的opencompass/opencompass/models/huggingface.py里面的get_token_len实现替换了,然后就跑通了...

不一定是正确解法,但能work 有官方解答或者更可信的方案的可以继续来回答下

老实讲感觉这个仓库 bug 挺多的...