shuangpe commented 1 year ago

I'd like to benchmark LLAMA2 model's optimized performance with BigDL on SPR machine.

Then I downloaded the BigDL repo and executed run-spr.sh , but it reported following error messages when executing inference statement:

run-spr.sh: line 6: 1107674 Segmentation fault      (core dumped) numactl -C 0-47 -m 0 python $(dirname "$0")/run.py

I used the latest main branch of BigDL repo, and passed in my local model directory llama2-7b-chat. The commands are shown below

git clone https://github.com/intel-analytics/BigDL.git
cd BigDL/python/llm/dev/benchmark/all-in-one

conda create -y -n bigdl_llm python=3.9
conda activate bigdl_llm
pip install -U pip
pip install --pre --upgrade bigdl-llm[all]
pip install bigdl-nano[pytorch]
pip install omegaconf pandas
source bigdl-nano-init

##### modify config.yaml to specify llama2-7b-chat model

bash run-spr.sh

Could you help to debug the issue?

jason-dai commented 1 year ago

See https://github.com/intel-analytics/BigDL/issues/9168

hkvision commented 1 year ago

See #9168

Not the same issue, he can import and convert the model, the segmentation fault happens when forward.

jason-dai commented 1 year ago

See #9168

Not the same issue, he can import and convert the model, the segmentation fault happens when forward.

What's the OS/glibc version used?

shuangpe commented 1 year ago

What's the OS/glibc version used?

cat /etc/lsb-release

DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

uname -a

Linux rypq-14 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

gcc --version

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

ldd --version

ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.

liu-shaojun commented 1 year ago

Hi @shuangpe we cannot reproduce this issue on SPR machine, can you share the config.yaml to us?

shuangpe commented 1 year ago

The last commit in BigDL repo is

commit 8f78ae109943d08a0a7635ad99d60d899204c91b (HEAD -> main, origin/main, origin/HEAD)
Author: binbin Deng <108676127+plusbang@users.noreply.github.com>
Date:   Thu Oct 19 18:40:48 2023 +0800

    LLM: improve gpu supports key feature doc page (#9212)

The config.yaml I'm using is:

repo_id:
  - 'daryl149/llama-2-7b-chat-hf'
local_model_hub:
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
in_out_pairs:
  - '32-32'
  - '1024-128'
test_api:
  - "transformer_int4"
  - "native_int4"
  - "optimize_model"
  - "pytorch_autocast_bf16"
  # - "ipex_fp16_gpu" # on Intel GPU
  # - "transformer_int4_gpu"  # on Intel GPU
  # - "optimize_model_gpu"  # on Intel GPU

Here is the full console output:

# bash run-spr.sh 
Sourcing bigdl-nano-init in: /root/miniconda3/envs/bigdl_llm/bin
Setting OMP_NUM_THREADS...
Setting KMP_AFFINITY...
Setting KMP_BLOCKTIME...
Setting jemalloc...
nano_vars.sh already exists
+++++ Env Variables +++++
LD_PRELOAD            = /root/miniconda3/envs/bigdl_llm/lib/libiomp5.so /root/miniconda3/envs/bigdl_llm/lib/python3.9/site-packages/bigdl/nano/libs/libjemalloc.so
MALLOC_CONF           = oversize_threshold:1,background_thread:false,metadata_thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1
OMP_NUM_THREADS       = 56
KMP_AFFINITY          = granularity=fine,none
KMP_BLOCKTIME         = 1
TF_ENABLE_ONEDNN_OPTS = 1
+++++++++++++++++++++++++
Complete.
(…)ma-2-7b-chat-hf/resolve/main/config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 507/507 [00:00<00:00, 134kB/s]
(…)esolve/main/pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 12.4MB/s]
pytorch_model-00001-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [02:04<00:00, 80.0MB/s]
pytorch_model-00002-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:43<00:00, 80.8MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:48<00:00, 84.28s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.84s/it]
(…)t-hf/resolve/main/generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 23.4kB/s]
(…)at-hf/resolve/main/tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 192kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 121MB/s]
(…)2-7b-chat-hf/resolve/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.38MB/s]
(…)-hf/resolve/main/special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 247kB/s]
>> loading of model costs 181.39399339724332s
<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
run-spr.sh: line 6: 1536912 Segmentation fault      (core dumped) numactl -C 0-47 -m 0 python $(dirname "$0")/run.py

liu-shaojun commented 1 year ago

@shuangpe we will try to reproduce using the config.yaml you provided.

liu-shaojun commented 1 year ago

As per synced with @shuangpe in Teams, we cannot reproduce this issue from our side, @shuangpe will try to download meta-llama/Llama-2-7b-chat-hf · Hugging Face and try again.

qiyuangong commented 12 months ago

Hi @shuangpe

We reproduced this issue on one of our SPR servers. Seems it's caused by LD_PRELOAD & libjemalloc.so.

Add unset LD_PRELOAD into run-spr.sh can solve this problem.

source bigdl-nano-init
export OMP_NUM_THREADS=48
# unset LD_PRELOAD from bigdl-nano-init
unset LD_PRELOAD
# set following parameters according to the actual specs of the test machine
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py

Also, we can reset LD_PRELOAD by override it.

export LD_PRELOAD=/home/spr/anaconda3/envs/qiyuan-llm/lib/libiomp5.so

qiyuangong commented 11 months ago

Another solution is modify run-spr.sh

Change source bigdl-nano-init to source bigdl-nano-init -t or source bigdl-llm-init -t.

This change jemalloc to tcmalloc. tcmalloc works well on latest ubuntu 22.04.

qiyuangong commented 11 months ago

Hi @shuangpe Please create a new conda env and install bigdl-llm. Also, upgrade to latest run-spr.sh. This issue has been resolved in the latest run-spr.sh.

intel-analytics / ipex-llm

Segmentation fault occurred when executing benchmark script on SPR machine #9202

cat /etc/lsb-release

uname -a

gcc --version

ldd --version