intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.56k stars 1.25k forks source link

InvalidModule: Invalid SPIR-V module #10502

Open zeminli opened 6 months ago

zeminli commented 6 months ago

(llm) E:\chatcode>set SYCL_CACHE_PERSISTENT=1

(llm) E:\chatcode>set BIGDL_LLM_XMX_DISABLED=1

(llm) E:\chatcode>python chatglm3_infer_gpu.py D:\Users\admin\anaconda3\envs\llm\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'D:\Users\admin\anaconda3\envs\llm\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( 2024-03-22 08:41:49,715 - INFO - intel_extension_for_pytorch auto imported Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 15.62it/s] 2024-03-22 08:41:50,393 - INFO - Converting the current model to sym_int4 format...... 2024-03-22 08:42:00,156 - WARNING - Setting eos_token is not supported, use the default one. 2024-03-22 08:42:00,157 - WARNING - Setting pad_token is not supported, use the default one. 2024-03-22 08:42:00,157 - WARNING - Setting unk_token is not supported, use the default one. InvalidModule: Invalid SPIR-V module: unsupported SPIR-V version number 'unknown (66560)'. Range of supported/known SPIR-V versions is 1.0 (65536) - 1.3 (66304)

env: conda-> Python 3.9.18 image

system-> windows10 image

python script:

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer
import intel_extension_for_pytorch as ipex
import torch

CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

# 请指定chatglm3-6b的本地路径
model_path = "e:/chatcode/chatglm3-6b"

# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,load_in_4bit=True,trust_remote_code=True)

# run the optimized model on Intel GPU
model = model.to('xpu')

# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)

# 制作ChatGLM3格式提示词    
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="What is Intel?")

# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
input_ids = input_ids.to('xpu')
st = time.time()

# 执行推理计算,生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()

# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)
JinBridger commented 6 months ago

Hi, zeminli!

Looks like it is possibly caused by GPU driver. Please update your GPU driver and try it again.

If it still crashes, please try to run this script in your python environment and tell us the output of this script and your GPU driver version :)

zeminli commented 6 months ago

Hi, zeminli!

Looks like it is possibly caused by GPU driver. Please update your GPU driver and try it again.

If it still crashes, please try to run this script in your python environment and tell us the output of this script and your GPU driver version :)


Same anomaly

update GPU driver:31.0.101.2115 -> 31.0.101.2127

run env-check.bat result:

(llm) E:\chatcode\check>env-check.bat
Python 3.9.18
-----------------------------------------------------------------
transformers=4.31.0
-----------------------------------------------------------------
torch=2.1.0a0+cxx11.abi
-----------------------------------------------------------------
Name: bigdl-llm
Version: 2.5.0b20240320
Summary: Large Language Model Develop Toolkit
Home-page: https://github.com/intel-analytics/BigDL
Author: BigDL Authors
Author-email: bigdl-user-group@googlegroups.com
License: Apache License, Version 2.0
Location: d:\users\admin\anaconda3\envs\llm\lib\site-packages
Requires:
Required-by:
-----------------------------------------------------------------
D:\Users\admin\anaconda3\envs\llm\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'D:\Users\admin\anaconda3\envs\llm\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
ipex=2.1.10+xpu
-----------------------------------------------------------------
System Information

主机名:           DESKTOP-J0IIQOO
OS 名称:          Microsoft Windows 10 专业版
OS 版本:          10.0.19045 暂缺 Build 19045
OS 制造商:        Microsoft Corporation
OS 配置:          独立工作站
OS 构建类型:      Multiprocessor Free
注册的所有人:     admin
注册的组织:       暂缺
产品 ID:          00330-80000-00000-AA193
初始安装日期:     2021/5/19, 9:38:23
系统启动时间:     2024/3/22, 12:03:20
系统制造商:       LENOVO
系统型号:         90MQCTO1WW
系统类型:         x64-based PC
处理器:           安装了 1 个处理器。
                  [01]: Intel64 Family 6 Model 165 Stepping 5 GenuineIntel ~2904 Mhz
BIOS 版本:        LENOVO M31KT22A, 2020/10/15
Windows 目录:     C:\Windows
系统目录:         C:\Windows\system32
启动设备:         \Device\HarddiskVolume1
系统区域设置:     zh-cn;中文(中国)
输入法区域设置:   zh-cn;中文(中国)
时区:             (UTC+08:00) 北京,重庆,香港特别行政区,乌鲁木齐
物理内存总量:     32,550 MB
可用的物理内存:   19,249 MB
虚拟内存: 最大值: 67,366 MB
虚拟内存: 可用:   51,130 MB
虚拟内存: 使用中: 16,236 MB
页面文件位置:     D:\pagefile.sys
域:               WORKGROUP
登录服务器:       \\DESKTOP-J0IIQOO
修补程序:         安装了 38 个修补程序。
                  [01]: KB5034466
                  [02]: KB5029714
                  [03]: KB4562830
                  [04]: KB4570334
                  [05]: KB4577586
                  [06]: KB4580325
                  [07]: KB4593175
                  [08]: KB5003791
                  [09]: KB5011048
                  [10]: KB5011050
                  [11]: KB5012170
                  [12]: KB5015684
                  [13]: KB5035845
                  [14]: KB5006753
                  [15]: KB5007273
                  [16]: KB5011352
                  [17]: KB5011651
                  [18]: KB5014032
                  [19]: KB5014035
                  [20]: KB5014671
                  [21]: KB5015895
                  [22]: KB5016705
                  [23]: KB5018506
                  [24]: KB5020372
                  [25]: KB5022924
                  [26]: KB5023794
                  [27]: KB5025315
                  [28]: KB5026879
                  [29]: KB5028318
                  [30]: KB5028380
                  [31]: KB5029709
                  [32]: KB5031539
                  [33]: KB5032392
                  [34]: KB5032907
                  [35]: KB5034224
                  [36]: KB5036447
                  [37]: KB5005699
                  [38]: KB5034441
网卡:             安装了 5 个 NIC。
                  [01]: Realtek PCIe GbE Family Controller
                      连接名:      以太网
                      启用 DHCP:   是
                      DHCP 服务器: 10.88.40.1
                      IP 地址
                        [01]: 10.88.40.40
                        [02]: fe80::c2f0:cd97:39ef:3382
                  [02]: Hyper-V Virtual Ethernet Adapter
                      连接名:      vEthernet (WSL)
                      启用 DHCP:   否
                      IP 地址
                        [01]: 172.30.192.1
                        [02]: fe80::46be:898d:f2a9:e638
                  [03]: TAP-Windows Adapter V9
                      连接名:      本地连接
                      状态:        媒体连接已中断
                  [04]: TAP-Windows Adapter V9
                      连接名:      本地连接 2
                      状态:        媒体连接已中断
                  [05]: Sangfor SSL VPN CS Support System VNIC
                      连接名:      以太网 2
                      状态:        媒体连接已中断
Hyper-V 要求:     已检测到虚拟机监控程序。将不显示 Hyper-V 所需的功能。
-----------------------------------------------------------------
'xpu-smi.exe' 不是内部或外部命令,也不是可运行的程序
或批处理文件。
xpu-smi is not installed properly.
系统找不到指定的批处理标签 - end
JinBridger commented 6 months ago

Hi zeminli,

Sorry that we didn't test bigdl-llm on this type of iGPU, so we couldn't reproduce this problem nor give a feasible solution.

But you still can run bigdl-llm on CPU. You could find bigdl-llm CPU installation guide here. Please note that the installation of bigdl-llm on CPU is different from GPU. Also, the python script may require some changes to run on CPU.

zeminli commented 6 months ago

Hi zeminli,

Sorry that we didn't test bigdl-llm on this type of iGPU, so we couldn't reproduce this problem nor give a feasible solution.

But you still can run bigdl-llm on CPU. You could find bigdl-llm CPU installation guide here. Please note that the installation of bigdl-llm on CPU is different from GPU. Also, the python script may require some changes to run on CPU.

Ok, I'll try. Thank you