intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.62k stars 1.26k forks source link

Segmentation fault occurred when executing benchmark script on SPR machine #9202

Open shuangpe opened 1 year ago

shuangpe commented 1 year ago

I'd like to benchmark LLAMA2 model's optimized performance with BigDL on SPR machine.

Then I downloaded the BigDL repo and executed run-spr.sh , but it reported following error messages when executing inference statement:

run-spr.sh: line 6: 1107674 Segmentation fault      (core dumped) numactl -C 0-47 -m 0 python $(dirname "$0")/run.py

I used the latest main branch of BigDL repo, and passed in my local model directory llama2-7b-chat. The commands are shown below

git clone https://github.com/intel-analytics/BigDL.git
cd BigDL/python/llm/dev/benchmark/all-in-one

conda create -y -n bigdl_llm python=3.9
conda activate bigdl_llm
pip install -U pip
pip install --pre --upgrade bigdl-llm[all]
pip install bigdl-nano[pytorch]
pip install omegaconf pandas
source bigdl-nano-init

##### modify config.yaml to specify llama2-7b-chat model

bash run-spr.sh

Could you help to debug the issue?

jason-dai commented 1 year ago

See https://github.com/intel-analytics/BigDL/issues/9168

hkvision commented 1 year ago

See #9168

Not the same issue, he can import and convert the model, the segmentation fault happens when forward.

jason-dai commented 1 year ago

See #9168

Not the same issue, he can import and convert the model, the segmentation fault happens when forward.

What's the OS/glibc version used?

shuangpe commented 1 year ago

What's the OS/glibc version used?

cat /etc/lsb-release

DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

uname -a

Linux rypq-14 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

gcc --version

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ldd --version

ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

liu-shaojun commented 1 year ago

Hi @shuangpe we cannot reproduce this issue on SPR machine, can you share the config.yaml to us?

shuangpe commented 1 year ago

The last commit in BigDL repo is

commit 8f78ae109943d08a0a7635ad99d60d899204c91b (HEAD -> main, origin/main, origin/HEAD)
Author: binbin Deng <108676127+plusbang@users.noreply.github.com>
Date:   Thu Oct 19 18:40:48 2023 +0800

    LLM: improve gpu supports key feature doc page (#9212)

The config.yaml I'm using is:

repo_id:
  - 'daryl149/llama-2-7b-chat-hf'
local_model_hub:
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
in_out_pairs:
  - '32-32'
  - '1024-128'
test_api:
  - "transformer_int4"
  - "native_int4"
  - "optimize_model"
  - "pytorch_autocast_bf16"
  # - "ipex_fp16_gpu" # on Intel GPU
  # - "transformer_int4_gpu"  # on Intel GPU
  # - "optimize_model_gpu"  # on Intel GPU

Here is the full console output:

# bash run-spr.sh 
Sourcing bigdl-nano-init in: /root/miniconda3/envs/bigdl_llm/bin
Setting OMP_NUM_THREADS...
Setting KMP_AFFINITY...
Setting KMP_BLOCKTIME...
Setting jemalloc...
nano_vars.sh already exists
+++++ Env Variables +++++
LD_PRELOAD            = /root/miniconda3/envs/bigdl_llm/lib/libiomp5.so /root/miniconda3/envs/bigdl_llm/lib/python3.9/site-packages/bigdl/nano/libs/libjemalloc.so
MALLOC_CONF           = oversize_threshold:1,background_thread:false,metadata_thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1
OMP_NUM_THREADS       = 56
KMP_AFFINITY          = granularity=fine,none
KMP_BLOCKTIME         = 1
TF_ENABLE_ONEDNN_OPTS = 1
+++++++++++++++++++++++++
Complete.
(…)ma-2-7b-chat-hf/resolve/main/config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 507/507 [00:00<00:00, 134kB/s]
(…)esolve/main/pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 12.4MB/s]
pytorch_model-00001-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [02:04<00:00, 80.0MB/s]
pytorch_model-00002-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:43<00:00, 80.8MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:48<00:00, 84.28s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.84s/it]
(…)t-hf/resolve/main/generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 23.4kB/s]
(…)at-hf/resolve/main/tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 192kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 121MB/s]
(…)2-7b-chat-hf/resolve/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.38MB/s]
(…)-hf/resolve/main/special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 247kB/s]
>> loading of model costs 181.39399339724332s
<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
run-spr.sh: line 6: 1536912 Segmentation fault      (core dumped) numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
liu-shaojun commented 1 year ago

@shuangpe we will try to reproduce using the config.yaml you provided.

liu-shaojun commented 1 year ago

As per synced with @shuangpe in Teams, we cannot reproduce this issue from our side, @shuangpe will try to download meta-llama/Llama-2-7b-chat-hf · Hugging Face and try again.

qiyuangong commented 12 months ago

Hi @shuangpe

We reproduced this issue on one of our SPR servers. Seems it's caused by LD_PRELOAD & libjemalloc.so.

Add unset LD_PRELOAD into run-spr.sh can solve this problem.

source bigdl-nano-init
export OMP_NUM_THREADS=48
# unset LD_PRELOAD from bigdl-nano-init
unset LD_PRELOAD
# set following parameters according to the actual specs of the test machine
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py

Also, we can reset LD_PRELOAD by override it.

export LD_PRELOAD=/home/spr/anaconda3/envs/qiyuan-llm/lib/libiomp5.so
qiyuangong commented 11 months ago

Another solution is modify run-spr.sh

Change source bigdl-nano-init to source bigdl-nano-init -t or source bigdl-llm-init -t.

This change jemalloc to tcmalloc. tcmalloc works well on latest ubuntu 22.04.

qiyuangong commented 11 months ago

Hi @shuangpe Please create a new conda env and install bigdl-llm. Also, upgrade to latest run-spr.sh. This issue has been resolved in the latest run-spr.sh.