intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.62k stars 1.26k forks source link

NPU inference error #11495

Open xduzhangjiayu opened 3 months ago

xduzhangjiayu commented 3 months ago

Hi, I am interested in the NPU inference for this project. I tried to run llama on NPU with python\llm\example\NPU\HF-Transformers-AutoModels\Model\llama2\generate.py. I used interface model.save_low_bit and AutoModelForCausalLM.load_low_bit to save and load the converted model, but during the load phase, the error is AttributeError: 'LlamaAttention' object has no attribute 'llama_attention_forward' I was not sure if i do this the converted model is not for NPU?

Any comment or advice is appreciated, thanks !

xduzhangjiayu commented 3 months ago

During the process there is a warning on console: site-packages\intel_npu_acceleration_library\backend__init__.py:18: UserWarning: NPU is not available in your system. Library will fallback to AUTO device selection mode

But I did have Intel's NPU and the latest drivers on my PC, and I ran the LLM inference script correctly with directly use intel_npu_acceleration_library.

leonardozcm commented 3 months ago

hi @xduzhangjiayu , for ipex-llm >= 2.1.0b20240704 you may try:

model.save_low_bit(model_path)

to save low bit model, and

 AutoModelForCausalLM.load_low_bit(model_path, trust_remote_code=True)

to load low bit model

xduzhangjiayu commented 3 months ago

Cannot install ipex-llm==2.1.0b20240704 for now, i'll try later, thanks !

RAKSHITH-JAYANTH commented 1 month ago

Hello, I too am facing the same error and used ipex-llm==2.1.0b20240704. But it was of no use. I am still getting the below error. \llm\Lib\site-packages\intel_npu_acceleration_library\backend__init__.py:18: UserWarning: NPU is not available in your system. Library will fallback to AUTO device selection mode check_npu_and_driver_version()

I am using the latest NPU driver as well. Also, without me importing ipex-llm or intel_extension_for_pytorch, the NPU is getting recognized by the intel_npu_Acceleration_library. But when I import any of these, the NPU is not recognized.

ch1y0q commented 1 month ago

Hello, I too am facing the same error and used ipex-llm==2.1.0b20240704. But it was of no use. I am still getting the below error. \llm\Lib\site-packages\intel_npu_acceleration_library\backendinit.py:18: UserWarning: NPU is not available in your system. Library will fallback to AUTO device selection mode check_npu_and_driver_version()

I am using the latest NPU driver as well. Also, without me importing ipex-llm or intel_extension_for_pytorch, the NPU is getting recognized by the intel_npu_Acceleration_library. But when I import any of these, the NPU is not recognized.

Hi @RAKSHITH-JAYANTH , we are unable to reproduce the same error. Maybe you can try upgrading ipex-llm to a newer version, e.g. 2.2.0b20240909 by pip install --pre --upgrade ipex-llm[npu]? Also, we have just updated some examples and the usage of save & load for NPU inference, you can refer to https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/ .

RAKSHITH-JAYANTH commented 1 month ago

Hello! This solution will lead to npu working but intel_extension_for_pytorch gives the below OS error.

import intel_extension_for_pytorch C:\Users\raksh\miniconda3\envs\llm\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\raksh\miniconda3\envs\llm\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source? warn( Traceback (most recent call last): File "", line 1, in File "C:\Users\raksh\miniconda3\envs\llm\Lib\site-packages\intel_extension_for_pytorch__init__.py", line 79, in raise err OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Users\raksh\miniconda3\envs\llm\Lib\site-packages\intel_extension_for_pytorch\bin\intel-ext-pt-gpu.dll" or one of its dependencies.

Then, once I pip install ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/, this import works but npu doesn't get recognized.

My requirement is that I need both NPU and GPU(xpu) to work in the same program.

Another point to note is that, if I install ipex or ipex_llm[xpu] without installing ipex_llm[npu], NPU gets recognized in programs where I use intel_npu_acceleration_library and don't import ipex or ipex_llm pertaining to xpu. But the moment I import ipex or ipex_llm for xpu, NPU is no longer recognized.

For your information, my OS is Windows.