Open shuangpe opened 1 year ago
See #9168
Not the same issue, he can import and convert the model, the segmentation fault happens when forward.
See #9168
Not the same issue, he can import and convert the model, the segmentation fault happens when forward.
What's the OS/glibc version used?
What's the OS/glibc version used?
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
Linux rypq-14 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
ldd (Ubuntu GLIBC 2.35-0ubuntu3.4) 2.35 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Written by Roland McGrath and Ulrich Drepper.
Hi @shuangpe we cannot reproduce this issue on SPR machine, can you share the config.yaml to us?
The last commit in BigDL repo is
commit 8f78ae109943d08a0a7635ad99d60d899204c91b (HEAD -> main, origin/main, origin/HEAD)
Author: binbin Deng <108676127+plusbang@users.noreply.github.com>
Date: Thu Oct 19 18:40:48 2023 +0800
LLM: improve gpu supports key feature doc page (#9212)
The config.yaml I'm using is:
repo_id:
- 'daryl149/llama-2-7b-chat-hf'
local_model_hub:
warm_up: 1
num_trials: 3
num_beams: 1 # default to greedy search
in_out_pairs:
- '32-32'
- '1024-128'
test_api:
- "transformer_int4"
- "native_int4"
- "optimize_model"
- "pytorch_autocast_bf16"
# - "ipex_fp16_gpu" # on Intel GPU
# - "transformer_int4_gpu" # on Intel GPU
# - "optimize_model_gpu" # on Intel GPU
Here is the full console output:
# bash run-spr.sh
Sourcing bigdl-nano-init in: /root/miniconda3/envs/bigdl_llm/bin
Setting OMP_NUM_THREADS...
Setting KMP_AFFINITY...
Setting KMP_BLOCKTIME...
Setting jemalloc...
nano_vars.sh already exists
+++++ Env Variables +++++
LD_PRELOAD = /root/miniconda3/envs/bigdl_llm/lib/libiomp5.so /root/miniconda3/envs/bigdl_llm/lib/python3.9/site-packages/bigdl/nano/libs/libjemalloc.so
MALLOC_CONF = oversize_threshold:1,background_thread:false,metadata_thp:always,dirty_decay_ms:-1,muzzy_decay_ms:-1
OMP_NUM_THREADS = 56
KMP_AFFINITY = granularity=fine,none
KMP_BLOCKTIME = 1
TF_ENABLE_ONEDNN_OPTS = 1
+++++++++++++++++++++++++
Complete.
(…)ma-2-7b-chat-hf/resolve/main/config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 507/507 [00:00<00:00, 134kB/s]
(…)esolve/main/pytorch_model.bin.index.json: 100%|██████████████████████████████████████████████████████████████████████████| 26.8k/26.8k [00:00<00:00, 12.4MB/s]
pytorch_model-00001-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 9.98G/9.98G [02:04<00:00, 80.0MB/s]
pytorch_model-00002-of-00002.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [00:43<00:00, 80.8MB/s]
Downloading shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:48<00:00, 84.28s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.84s/it]
(…)t-hf/resolve/main/generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 137/137 [00:00<00:00, 23.4kB/s]
(…)at-hf/resolve/main/tokenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████████| 727/727 [00:00<00:00, 192kB/s]
tokenizer.model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 121MB/s]
(…)2-7b-chat-hf/resolve/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.38MB/s]
(…)-hf/resolve/main/special_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 247kB/s]
>> loading of model costs 181.39399339724332s
<class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>
run-spr.sh: line 6: 1536912 Segmentation fault (core dumped) numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
@shuangpe we will try to reproduce using the config.yaml you provided.
As per synced with @shuangpe in Teams, we cannot reproduce this issue from our side, @shuangpe will try to download meta-llama/Llama-2-7b-chat-hf · Hugging Face and try again.
Hi @shuangpe
We reproduced this issue on one of our SPR servers. Seems it's caused by LD_PRELOAD
& libjemalloc.so
.
Add unset LD_PRELOAD
into run-spr.sh
can solve this problem.
source bigdl-nano-init
export OMP_NUM_THREADS=48
# unset LD_PRELOAD from bigdl-nano-init
unset LD_PRELOAD
# set following parameters according to the actual specs of the test machine
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
Also, we can reset LD_PRELOAD
by override it.
export LD_PRELOAD=/home/spr/anaconda3/envs/qiyuan-llm/lib/libiomp5.so
Another solution is modify run-spr.sh
Change source bigdl-nano-init
to source bigdl-nano-init -t
or source bigdl-llm-init -t
.
This change jemalloc
to tcmalloc
. tcmalloc works well on latest ubuntu 22.04.
Hi @shuangpe
Please create a new conda env and install bigdl-llm. Also, upgrade to latest run-spr.sh
. This issue has been resolved in the latest run-spr.sh
.
I'd like to benchmark LLAMA2 model's optimized performance with BigDL on SPR machine.
Then I downloaded the BigDL repo and executed run-spr.sh , but it reported following error messages when executing inference statement:
I used the latest main branch of BigDL repo, and passed in my local model directory
llama2-7b-chat
. The commands are shown belowCould you help to debug the issue?