open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
https://opencompass.org.cn/
Apache License 2.0
4.07k stars 429 forks source link

[Bug] OpenICLInfer fail #1040

Closed yileitu closed 6 months ago

yileitu commented 7 months ago

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
 'CUDA_HOME': '/cluster/apps/gcc-9.3.0/cuda-12.1.1-wm3izjnq446qhwfu346qw77pcygwuu43',
 'GCC': 'gcc (GCC) 9.3.0',
 'GPU 0': 'Tesla V100-SXM2-32GB',
 'MMEngine': '0.10.3',
 'MUSA available': False,
 'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105',
 'OpenCV': '4.9.0',
 'PyTorch': '2.2.2',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 9.3\n'
                              '  - C++ Version: 201703\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2023.1-Product Build 20230303 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v3.3.2 (Git Hash '
                              '2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX2\n'
                              '  - CUDA Runtime 12.1\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
                              '  - CuDNN 8.9.2\n'
                              '  - Magma 2.6.1\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
                              'CUDNN_VERSION=8.9.2, '
                              'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
                              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
                              '-fabi-version=11 -fvisibility-inlines-hidden '
                              '-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
                              '-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-O2 -fPIC -Wall -Wextra -Werror=return-type '
                              '-Werror=non-virtual-dtor -Werror=bool-operation '
                              '-Wnarrowing -Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-strict-overflow -Wno-strict-aliasing '
                              '-Wno-stringop-overflow -Wsuggest-override '
                              '-Wno-psabi -Wno-error=pedantic '
                              '-Wno-error=old-style-cast -Wno-missing-braces '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, '
                              'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, '
                              'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, '
                              'USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, '
                              'USE_ROCM_KERNEL_ASSERT=OFF, \n',
 'Python': '3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]',
 'TorchVision': '0.17.2',
 'numpy_random_seed': 2147483648,
 'opencompass': '0.2.3+16f29b2',
 'sys.platform': 'linux'}

Reproduces the problem - code/configuration sample

NA

Reproduces the problem - command or script

#!/bin/bash -l

#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --time=24:00:00
#SBATCH --mem-per-cpu=16384
#SBATCH --gpus=v100:1

module load eth_proxy
module load gcc/9.3.0
module load cuda/12.1.1
conda activate opencompass

python run.py \
  --models llama2_7b \
  --datasets tydiqa_gen

Config before python run.py xxxxx is for my school's Slurm cluster. I made sure those previous config was correct and valid, as verified in other projects, and here I requested a single V100 card.

Reproduces the problem - error message

04/11 09:19:42 - OpenCompass - INFO - Loading tydiqa_gen: configs/datasets/tydiqa/tydiqa_gen.py 04/11 09:19:43 - OpenCompass - INFO - Loading llama2_7b: configs/models/llama/llama2_7b.py 04/11 09:19:43 - OpenCompass - INFO - Loading example: configs/summarizers/example.py 04/11 09:19:43 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored. 04/11 09:19:44 - OpenCompass - INFO - Partitioned into 4 tasks. 0% 0/4 [00:00<?, ?it/s]launch OpenICLInfer[llama-2-7b/tydiqa-goldp_arabic,llama-2-7b/tydiqa-goldp_russian] on GPU 0 04/11 09:19:49 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[llama-2-7b/tydiqa-goldp_arabic,llama-2-7b/tydiqa-goldp_russian] fail, see ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_arabic.out 25% ██▌ 1/4 [00:06<00:15, 5.22s/it]launch OpenICLInfer[llama-2-7b/tydiqa-goldp_japanese,llama-2-7b/tydiqa-goldp_english,llama-2-7b/tydiqa-goldp_korean,llama-2-7b/tydiqa-goldp_bengali] on GPU 0 04/11 09:19:53 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[llama-2-7b/tydiqa-goldp_japanese,llama-2-7b/tydiqa-goldp_english,llama-2-7b/tydiqa-goldp_korean,llama-2-7b/tydiqa-goldp_bengali] fail, see ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_japanese.out 50% █████ 2/4 [00:10<00:09, 4.52s/it]launch OpenICLInfer[llama-2-7b/tydiqa-goldp_telugu,llama-2-7b/tydiqa-goldp_indonesian,llama-2-7b/tydiqa-goldp_swahili] on GPU 0 04/11 09:19:58 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[llama-2-7b/tydiqa-goldp_telugu,llama-2-7b/tydiqa-goldp_indonesian,llama-2-7b/tydiqa-goldp_swahili] fail, see ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_telugu.out 75% ███████▌ 3/4 [00:14<00:04, 4.52s/it]launch OpenICLInfer[llama-2-7b/tydiqa-goldp_thai,llama-2-7b/tydiqa-goldp_finnish] on GPU 0 04/11 09:20:02 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[llama-2-7b/tydiqa-goldp_thai,llama-2-7b/tydiqa-goldp_finnish] fail, see ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_thai.out 100% ██████████ 4/4 [00:17<00:00, 4.50s/it] 04/11 09:20:02 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/base.py - summarize - 64 - OpenICLInfer[llama-2-7b/tydiqa-goldp_arabic,llama-2-7b/tydiqa-goldp_russian] failed with code 1 04/11 09:20:02 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/base.py - summarize - 64 - OpenICLInfer[llama-2-7b/tydiqa-goldp_thai,llama-2-7b/tydiqa-goldp_finnish] failed with code 1 04/11 09:20:02 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/base.py - summarize - 64 - OpenICLInfer[llama-2-7b/tydiqa-goldp_telugu,llama-2-7b/tydiqa-goldp_indonesian,llama-2-7b/tydiqa-goldp_swahili] failed with code 1 04/11 09:20:02 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/base.py - summarize - 64 - OpenICLInfer[llama-2-7b/tydiqa-goldp_japanese,llama-2-7b/tydiqa-goldp_english,llama-2-7b/tydiqa-goldp_korean,llama-2-7b/tydiqa-goldp_bengali] failed with code 1 04/11 09:20:02 - OpenCompass - INFO - Partitioned into 11 tasks. 100% ██████████ 11/11 [02:34<00:00, 14.07s/it] launch OpenICLEval[llama-2-7b/tydiqa-goldp_arabic] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_bengali] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_english] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_finnish] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_indonesian] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_japanese] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_korean] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_russian] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_swahili] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_telugu] on CPU launch OpenICLEval[llama-2-7b/tydiqa-goldp_thai] on CPU dataset version metric mode llama-2-7b

tydiqa-goldp_arabic - - - - tydiqa-goldp_bengali - - - - tydiqa-goldp_english - - - - tydiqa-goldp_finnish - - - - tydiqa-goldp_indonesian - - - - tydiqa-goldp_japanese - - - - tydiqa-goldp_korean - - - - tydiqa-goldp_russian - - - - tydiqa-goldp_swahili - - - - tydiqa-goldp_telugu - - - - tydiqa-goldp_thai - - - - 04/11 09:22:37 - OpenCompass - INFO - write summary to /cluster/project/sachan/yilei/projects/opencompass/outputs/default/20240411_091943/summary/summary_20240411_091943.txt 04/11 09:22:37 - OpenCompass - INFO - write csv to /cluster/project/sachan/yilei/projects/opencompass/outputs/default/20240411_091943/summary/summary_20240411_091943.csv

Other information

Dataset: tydiqa_gen

I'm working on a multilingual LLM project and learned that opencompass is convenient for doing eval, so I gave it a try for the first time. I tried both tidyqa and XCOPA, and both reported OpenICLInfer Error.

yileitu commented 7 months ago

Forget to attach ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_arabic.out. It reads

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

All other tydiqa-goldp_xx.out file are the same as above.

Ethan-9606 commented 7 months ago

i got the same error....

kkkparty commented 7 months ago

i got the same error....

please tell when it was fixed

IcyFeather233 commented 7 months ago

try export MKL_SERVICE_FORCE_INTEL=1 and run again

kkkparty commented 7 months ago

try export MKL_SERVICE_FORCE_INTEL=1 and run again

it doesn't work

seanxuu commented 7 months ago

same Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

seanxuu commented 7 months ago

try export MKL_SERVICE_FORCE_INTEL=1 and run again

it doesn't work

https://github.com/pytorch/pytorch/issues/37377#issuecomment-1825175772

yileitu commented 6 months ago

Dear team, any updates? It seems that this bug is exclusively associated with some datasets like tidyqa and XCOPA. I can run the exemplar script successfully with meaningful outputs, in the same setting and environment.

python run.py --models hf_opt_125m hf_opt_350m --datasets siqa_gen winograd_ppl
bittersweet1999 commented 6 months ago

How about export MKL_THREADING_LAYER=GNU export MKL_SERVICE_FORCE_INTEL=1

bittersweet1999 commented 6 months ago

And please check your environment, whether updated Pytorch, transformers, and whether running on Linux

yileitu commented 6 months ago

How about export MKL_THREADING_LAYER=GNU export MKL_SERVICE_FORCE_INTEL=1

It doesn't work. I tried.

And please check your environment, whether updated Pytorch, transformers, and whether running on Linux

Libraries are updated. Yes it is indeed running on Linux. Anything particular I should care about if it is on Linux?

Or could you provide a script that you/admins have verified can successfully run TyDiQA Evaluation? (Any model would be fine). I can try to reproduce it in my environment and find out the differences. I think this is the fastest way to solve this issue.

bittersweet1999 commented 6 months ago
from mmengine.config import read_base

from opencompass.models import HuggingFaceCausalLM
from opencompass.partitioners import NaivePartitioner
from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLInferTask
from opencompass.tasks.subjective_eval import SubjectiveEvalTask

with read_base():
    from .datasets.tydiqa.tydiqa_gen import tydiqa_datasets
    from .models.hf_internlm.hf_internlm2_chat_7b import models
datasets = [*tydiqa_datasets]

from opencompass.models import HuggingFaceCausalLM

_meta_template = dict(
    round=[
        dict(role='HUMAN', begin='<|im_start|>user\n', end='<|im_end|>\n'),
        dict(role='BOT', begin='<|im_start|>assistant\n', end='<|im_end|>\n', generate=True),
    ],
)

models = [
    dict(
        type=HuggingFaceCausalLM,
        abbr='internlm2-chat-7b-hf',
        path="internlm/internlm2-chat-7b",
        tokenizer_path='internlm/internlm2-chat-7b',
        model_kwargs=dict(
            trust_remote_code=True,
            device_map='auto',
        ),
        tokenizer_kwargs=dict(
            padding_side='left',
            truncation_side='left',
            use_fast=False,
            trust_remote_code=True,
        ),
        max_out_len=2048,
        max_seq_len=2048,
        batch_size=8,
        meta_template=_meta_template,
        run_cfg=dict(num_gpus=1, num_procs=1),
        end_str='<|im_end|>',
        generation_kwargs = {"eos_token_id": [2, 92542], "do_sample": True},
        batch_padding=True,
    )
]

infer = dict(
    partitioner=dict(type=NaivePartitioner),
    runner=dict(
        type=LocalRunner,
        max_num_workers=256,
        task=dict(type=OpenICLInferTask)),
)

work_dir = 'outputs/test/'

hi here is my config, and runned by the script below

conda activate opencompass
export MKL_SERVICE_FORCE_INTEL=1
export HF_EVALUATE_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
export HF_ENDPOINT=https://hf-mirror.com
export TRANSFORMERS_CACHE='my cache dir'
python run.py configs/eval_my_config.py --mode all --reuse latest 
yileitu commented 6 months ago

Same error happens with your code

04/28 19:40:37 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[internlm2-chat-7b-hf/tydiqa-goldp_arabic] fail, see
outputs/test/20240428_193926/logs/infer/internlm2-chat-7b-hf/tydiqa-goldp_arabic.out

It seems like a Linux-special problem.

bittersweet1999 commented 6 months ago

I am also running on linux platform, after checking the environment like PyTorch and transformers, our different is only the GPU version, I used A100 80G.

yileitu commented 6 months ago

Most probably not GPU problem. I tested it on A100 80G but still got the same error

yileitu commented 6 months ago

For those who encounter the same OpenICLInfer Error, reinstalling numpy in the opencompass conda env works for me! Plus I add two env vars:

export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU