Closed yileitu closed 6 months ago
Forget to attach ./outputs/default/20240411_091943/logs/infer/llama-2-7b/tydiqa-goldp_arabic.out
. It reads
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
All other tydiqa-goldp_xx.out
file are the same as above.
i got the same error....
i got the same error....
please tell when it was fixed
try export MKL_SERVICE_FORCE_INTEL=1
and run again
try
export MKL_SERVICE_FORCE_INTEL=1
and run again
it doesn't work
same Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library. Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
try
export MKL_SERVICE_FORCE_INTEL=1
and run againit doesn't work
https://github.com/pytorch/pytorch/issues/37377#issuecomment-1825175772
Dear team, any updates? It seems that this bug is exclusively associated with some datasets like tidyqa
and XCOPA
. I can run the exemplar script successfully with meaningful outputs, in the same setting and environment.
python run.py --models hf_opt_125m hf_opt_350m --datasets siqa_gen winograd_ppl
How about
export MKL_THREADING_LAYER=GNU export MKL_SERVICE_FORCE_INTEL=1
And please check your environment, whether updated Pytorch, transformers, and whether running on Linux
How about
export MKL_THREADING_LAYER=GNU export MKL_SERVICE_FORCE_INTEL=1
It doesn't work. I tried.
And please check your environment, whether updated Pytorch, transformers, and whether running on Linux
Libraries are updated. Yes it is indeed running on Linux. Anything particular I should care about if it is on Linux?
Or could you provide a script that you/admins have verified can successfully run TyDiQA Evaluation? (Any model would be fine). I can try to reproduce it in my environment and find out the differences. I think this is the fastest way to solve this issue.
from mmengine.config import read_base
from opencompass.models import HuggingFaceCausalLM
from opencompass.partitioners import NaivePartitioner
from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLInferTask
from opencompass.tasks.subjective_eval import SubjectiveEvalTask
with read_base():
from .datasets.tydiqa.tydiqa_gen import tydiqa_datasets
from .models.hf_internlm.hf_internlm2_chat_7b import models
datasets = [*tydiqa_datasets]
from opencompass.models import HuggingFaceCausalLM
_meta_template = dict(
round=[
dict(role='HUMAN', begin='<|im_start|>user\n', end='<|im_end|>\n'),
dict(role='BOT', begin='<|im_start|>assistant\n', end='<|im_end|>\n', generate=True),
],
)
models = [
dict(
type=HuggingFaceCausalLM,
abbr='internlm2-chat-7b-hf',
path="internlm/internlm2-chat-7b",
tokenizer_path='internlm/internlm2-chat-7b',
model_kwargs=dict(
trust_remote_code=True,
device_map='auto',
),
tokenizer_kwargs=dict(
padding_side='left',
truncation_side='left',
use_fast=False,
trust_remote_code=True,
),
max_out_len=2048,
max_seq_len=2048,
batch_size=8,
meta_template=_meta_template,
run_cfg=dict(num_gpus=1, num_procs=1),
end_str='<|im_end|>',
generation_kwargs = {"eos_token_id": [2, 92542], "do_sample": True},
batch_padding=True,
)
]
infer = dict(
partitioner=dict(type=NaivePartitioner),
runner=dict(
type=LocalRunner,
max_num_workers=256,
task=dict(type=OpenICLInferTask)),
)
work_dir = 'outputs/test/'
hi here is my config, and runned by the script below
conda activate opencompass
export MKL_SERVICE_FORCE_INTEL=1
export HF_EVALUATE_OFFLINE=1
export HF_DATASETS_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
export HF_ENDPOINT=https://hf-mirror.com
export TRANSFORMERS_CACHE='my cache dir'
python run.py configs/eval_my_config.py --mode all --reuse latest
Same error happens with your code
04/28 19:40:37 - OpenCompass - ERROR - /cluster/project/sachan/yilei/projects/opencompass/opencompass/runners/local.py - _launch - 192 - task OpenICLInfer[internlm2-chat-7b-hf/tydiqa-goldp_arabic] fail, see
outputs/test/20240428_193926/logs/infer/internlm2-chat-7b-hf/tydiqa-goldp_arabic.out
It seems like a Linux-special problem.
I am also running on linux platform, after checking the environment like PyTorch and transformers, our different is only the GPU version, I used A100 80G.
Most probably not GPU problem. I tested it on A100 80G but still got the same error
For those who encounter the same OpenICLInfer Error, reinstalling numpy
in the opencompass
conda env works for me! Plus I add two env vars:
export MKL_SERVICE_FORCE_INTEL=1
export MKL_THREADING_LAYER=GNU
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
Reproduces the problem - code/configuration sample
NA
Reproduces the problem - command or script
Config before
python run.py xxxxx
is for my school's Slurm cluster. I made sure those previous config was correct and valid, as verified in other projects, and here I requested a single V100 card.Reproduces the problem - error message
tydiqa-goldp_arabic - - - - tydiqa-goldp_bengali - - - - tydiqa-goldp_english - - - - tydiqa-goldp_finnish - - - - tydiqa-goldp_indonesian - - - - tydiqa-goldp_japanese - - - - tydiqa-goldp_korean - - - - tydiqa-goldp_russian - - - - tydiqa-goldp_swahili - - - - tydiqa-goldp_telugu - - - - tydiqa-goldp_thai - - - - 04/11 09:22:37 - OpenCompass - INFO - write summary to /cluster/project/sachan/yilei/projects/opencompass/outputs/default/20240411_091943/summary/summary_20240411_091943.txt 04/11 09:22:37 - OpenCompass - INFO - write csv to /cluster/project/sachan/yilei/projects/opencompass/outputs/default/20240411_091943/summary/summary_20240411_091943.csv
Other information
Dataset:
tydiqa_gen
I'm working on a multilingual LLM project and learned that opencompass is convenient for doing eval, so I gave it a try for the first time. I tried both
tidyqa
andXCOPA
, and both reportedOpenICLInfer
Error.