intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.45k stars 1.24k forks source link

[Max1100/bigdl-llm]Met OOM easily when running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch #9979

Open Yanli2190 opened 7 months ago

Yanli2190 commented 7 months ago

When running llama2-7b/Mistral-7B-v0.1 int4/fp8 multi-batch with Max1100, we easily met OOM issue it looks like that when we enable multi-batch, if we run the model mutli-iters, the memory keep increase for each iters HW: Max1100 OS: Ubuntu 22.04 SW: oneAPI 2024.0/bigdl-llm 2.5.0b20240118 based on torch 2.1 GPU driver: https://dgpu-docs.intel.com/releases/stable_775_20_20231219.html How to reproduce:

  1. create conda env and install bigdl via "pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu"
  2. run the attached run.sh on Max1100 and monitor the GPU memory via " sudo xpu-smi dump -m 0,1,2,3,4,5,18"
  3. The GPU memory will increase per each iter and we will meet OOM after multi-iters
Yanli2190 commented 7 months ago

log.txt

Yanli2190 commented 7 months ago

benchmark_hf_model_bigdl.txt run.txt

Ricky-Ting commented 7 months ago

We failed to reproduce this problem on our machine (max1100)

Environments:

bigdl's version: 2.5.0b20240118
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1100 OpenCL 3.0 NEO  [23.30.26918.50]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1100 1.3 [1.3.26918]

Here is the log:

 - INFO - intel_extension_for_pytorch auto imported
loading model...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 13.34it/s]
 - INFO - Converting the current model to sym_int4 format......
LlamaAttention(
  (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
  (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
  (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
  (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
  (rotary_emb): LlamaRotaryEmbedding()
)
warming up for 10 iterations...
finished warmup
prefill (512 tokens x 8 batch) + generation (512 tokens x 8 batch):
0
    iter 1:  xx sec total
1
    iter 2:  xx sec total
2
    iter 3:  xx sec total
3
    iter 4:  xx sec total
4
    iter 5:  xx sec total
5
    iter 6:  xx sec total
6
    iter 7:  xx sec total
7
    iter 8:  xx sec total
8
    iter 9:  xx sec total
9
    iter 10:  xx sec total
10
    iter 11:  xx sec total
11

Here is the GPU mem stats:

Timestamp, DeviceId, GPU Utilization (%), GPU Power (W), GPU Frequency (MHz), GPU Core Temperature (Celsius Degree), GPU Memory Temperature (Celsius Degree), GPU Memory Utilization (%), GPU Memory Used (MiB)
01:02:55.000,    0, 99.76, 196.87, 1550,  N/A,  N/A, 43.91, 21579.29
01:02:56.000,    0, 99.81, 196.76, 1550,  N/A,  N/A, 43.91, 21579.29
01:02:57.000,    0, 99.82, 197.18, 1550,  N/A,  N/A, 43.91, 21579.29
01:02:58.000,    0, 99.85, 197.55, 1550,  N/A,  N/A, 43.91, 21579.29
01:02:59.000,    0, 89.60, 184.65,    0,  N/A,  N/A, 43.91, 21579.29
01:03:00.000,    0, 0.00, 27.95,    0,  N/A,  N/A, 43.91, 21579.29
01:03:01.000,    0, 0.00, 27.88,    0,  N/A,  N/A, 43.91, 21579.29
01:03:02.000,    0, 0.00, 27.85,    0,  N/A,  N/A, 43.91, 21579.29
01:03:03.000,    0, 0.00, 27.78,    0,  N/A,  N/A, 43.91, 21579.29
01:03:04.000,    0, 9.05, 51.09, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:05.000,    0, 99.33, 209.28, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:06.000,    0, 99.67, 191.56, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:07.000,    0, 99.77, 192.01, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:08.000,    0, 99.82, 193.04, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:09.000,    0, 99.78, 192.70, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:10.000,    0, 99.82, 192.79, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:11.000,    0, 99.82, 192.93, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:12.000,    0, 99.82, 193.61, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:13.000,    0, 99.79, 193.89, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:14.000,    0, 99.69, 194.24, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:15.000,    0, 99.51, 194.23, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:16.000,    0, 99.55, 195.14, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:17.000,    0, 99.55, 195.87, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:18.000,    0, 99.54, 195.74, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:19.000,    0, 99.74, 196.17, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:20.000,    0, 99.71, 196.35, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:21.000,    0, 99.82, 197.02, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:22.000,    0, 99.82, 197.39, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:23.000,    0, 99.83, 197.36, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:24.000,    0, 99.85, 197.49, 1550,  N/A,  N/A, 43.91, 21579.28
01:03:25.000,    0, 40.15, 109.33,    0,  N/A,  N/A, 43.91, 21579.28
01:03:26.000,    0, 4.43, 46.86,    0,  N/A,  N/A, 43.91, 21579.28
01:03:27.000,    0, 0.00, 27.86,    0,  N/A,  N/A, 43.91, 21579.28
01:03:28.000,    0, 0.00, 27.75,    0,  N/A,  N/A, 43.91, 21579.28
01:03:29.000,    0, 0.00, 27.72,    0,  N/A,  N/A, 43.91, 21579.28
01:03:30.000,    0, 58.30, 191.29, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:31.000,    0, 99.38, 191.68, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:32.000,    0, 99.57, 191.20, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:33.000,    0, 99.68, 191.98, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:34.000,    0, 99.81, 192.40, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:35.000,    0, 99.82, 192.78, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:36.000,    0, 99.81, 193.62, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:37.000,    0, 99.82, 193.19, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:38.000,    0, 99.81, 193.47, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:39.000,    0, 99.80, 193.92, 1550,  N/A,  N/A, 46.94, 23066.18
01:03:40.000,    0, 98.81, 194.58, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:41.000,    0, 99.59, 195.55, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:42.000,    0, 99.60, 196.16, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:43.000,    0, 99.73, 196.77, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:44.000,    0, 99.75, 196.72, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:45.000,    0, 99.77, 196.74, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:46.000,    0, 99.80, 197.54, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:47.000,    0, 99.75, 197.94, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:48.000,    0, 99.84, 197.89, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:49.000,    0, 99.82, 197.96, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:50.000,    0, 99.83, 198.46, 1550,  N/A,  N/A, 43.91, 21579.29
01:03:51.000,    0, 1.77, 45.10,    0,  N/A,  N/A, 43.91, 21579.29
01:03:52.000,    0, 0.00, 27.82,    0,  N/A,  N/A, 43.91, 21579.29