Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.69k
stars
1.26k
forks
source link
Run llama2 on windows A750 failed: No module named 'linear_fp16_esimd' #10698
File "C:\Users\arda\miniconda3\envs\xin-llm\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\arda\miniconda3\envs\xin-llm\lib\site-packages\ipex_llm\transformers\models\llama.py", line 320, in llama_attention_forward_4_31
return forward_function(
File "C:\Users\arda\miniconda3\envs\xin-llm\lib\site-packages\ipex_llm\transformers\models\llama.py", line 642, in llama_attention_forward_4_31_original
use_esimd_sdp(q_len, key_states.shape[2], self.head_dim, query_states, attention_mask):
File "C:\Users\arda\miniconda3\envs\xin-llm\lib\site-packages\ipex_llm\transformers\models\utils.py", line 336, in use_esimd_sdp
import linear_fp16_esimd
ModuleNotFoundError: No module named 'linear_fp16_esimd'
Get below error: