intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.47k stars 1.24k forks source link

infernece chatGLM2-6b with BigDL-LLM INT4 failed #9336

Open Fred-cell opened 10 months ago

Fred-cell commented 10 months ago

bigdl-core-xe 2.4.0b20231101 bigdl-core-xe-esimd 2.4.0b20231101 bigdl-llm 2.4.0b20231101

the second inference failed as "segmentation default" log as below: /root/anaconda3/envs/bigdl-llm/lib/python3.10/site-packages/bigdl/llm/transformers/models/chatglm2.py:137: UserWarning: IPEX XPU dedicated fusion passes are enabled in ScriptGraph non profiling execution mode. Please enable profiling execution mode to retrieve device guard. (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/jit/fusion_pass.cpp:826.) query_layer = apply_rotary_pos_emb(query_layer, rotary_pos_emb) =========First token cost 5.1180 s========= =========Rest tokens cost average 0.0331 s (294 tokens in all)========= Segmentation fault (core dumped)

Fred-cell commented 10 months ago

the configuration is recommended, but not required. image

Fred-cell commented 10 months ago

and, after I set these configuration, the bug is still alive.

cyita commented 10 months ago

You can raise the system open file limit using ulimit -n 2048

Fred-cell commented 10 months ago

it is work around only, if other people encounter this issue, please append below.

hkvision commented 10 months ago

Yes, it is a workaround, we will also monitor on our side if we encounter this in the future.

jason-dai commented 10 months ago

shall we add it to llm-init?

hkvision commented 10 months ago

shall we add it to llm-init?

Maybe not? We will monitor if this issue happens in our env or by others to follow up this.

cyita commented 10 months ago

shall we add it to llm-init?

1024 is enough on our desktops. And I monitored the fd opened when running llama2-7b-half in fp8 precision 32-32 (20 times) on arc02, it shows this process opens up to 166 fds. We may need to find the root cause and decide whether to set this parameter. Since this setting will affect other applications running on the current terminal. image