intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.31k stars 1.23k forks source link

llama3-8B causes MTL iGPU runtime error when ipex-llm's running AI inference #10999

Open zcwang opened 2 months ago

zcwang commented 2 months ago

Hello ipex-llm experts, I suffers issue about Llama-3-8B on MTL-H's iGPU and need any advice from you. :)

It seems to have issue with iGPU in MTL 155H but no issue with ARC770 in Ubuntu 22.04+kernel v6.8.2.

(llm-test) intel@mydevice:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:0 python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-13 14:56:26,831 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 11.45it/s]
2024-05-13 14:56:27,298 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Inference time: 1.4299554824829102 s
-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The fascinating history of Intel!

Intel Corporation, one of the world's leading semiconductor companies, has a rich history that spans over six decades. Here's a brief overview:

**Early Years (1957-1969)**

Intel was founded on July 18, 1957, by Gordon Moore and Robert Noy

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64
Uptime: 12.174066 s

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763) LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H] Registry and code: 13 MB Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64 Uptime: 11.134912 s

Environment info

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2024.17.3.0.08_160000] [opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 155H OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000] [opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO [24.13.29138.7] [opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO [24.13.29138.7] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.29138] [ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]

intel_extension_for_pytorch 2.1.20+git0e2bee2 torch 2.1.0.post0+cxx11.abi torchvision 0.16.0+fbb4cc5 sentence-transformers 2.3.1 transformers 4.37.0 transformers-stream-generator 0.0.5

qiuxin2012 commented 2 months ago

Arc770 and iGPU can't working on the same env, we are still working on it, related issue: https://github.com/intel-analytics/ipex-llm/issues/10940 But the error is different, should be RuntimeError: could not create a primitive. This difference may be caused by your different torch version.

zcwang commented 2 months ago

Got it! I will remove ARC770 to test my iGPU again in MTL.

BTW I also test the same SW environment in my TGL platform (Corei7-1185G7) and the iGPU indeed works well.

intel_extension_for_pytorch   2.1.20+git0e2bee2
torch                         2.1.0.post0+cxx11.abi
torchvision                   0.16.0+fbb4cc5
intel-openmp                  2024.1.0
openvino                      2024.1.0
openvino-telemetry            2024.1.0

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.29138]

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

-------------------- Output (skip_special_tokens=False) -------------------- <|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Intel Corporation is an American multinational corporation that specializes in the design and manufacture of microprocessors, memory chips, and other semiconductor technologies. Here is a brief history of the company:

Early Years (1968-1979)

Intel was founded on July 18, 1968, by Gordon Moore and Robert N



@qiuxin2012 , I appreciate your support.
zcwang commented 2 months ago

@qiuxin2012 . I confirmed MTL-H iGPU works well without ARC770 in platform.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2024.17.3.0.08_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 155H OpenCL 3.0 (Build 0) [2024.17.3.0.08_160000]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) Graphics OpenCL 3.0 NEO  [24.13.29138.7]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) Graphics 1.3 [1.3.29138]
...
(llm) intel@mydevice:~/work/ipex-llm/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama3$ ONEAPI_DEVICE_SELECTOR=level_zero:0 python ./generate.py --repo-id-or-model-path meta-ll
ama/Meta-Llama-3-8B-Instruct --prompt 'History of Intel' --n-predict 64
2024-05-15 10:36:33,547 - INFO - intel_extension_for_pytorch auto imported
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00,  5.48it/s]
2024-05-15 10:36:34,559 - INFO - Converting the current model to sym_int4 format......
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
Inference time: 6.857227563858032 s
-------------------- Prompt --------------------
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

-------------------- Output (skip_special_tokens=False) --------------------
<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>

History of Intel<|eot_id|><|start_header_id|>assistant<|end_header_id|>

The legendary Intel!

Intel Corporation is an American multinational corporation that specializes in the design and manufacture of microprocessors, the "brain" of modern computers. Here's a brief history of the company:

**Early Years (1968-1971)**

Intel was founded on July 18, 1968, by Gordon

LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [Intel(R) Core(TM) Ultra 7 155H]
Registry and code: 13 MB
Command: python ./generate.py --repo-id-or-model-path meta-llama/Meta-Llama-3-8B-Instruct --prompt History of Intel --n-predict 64
Uptime: 63.459550 s