intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.61k stars 1.26k forks source link

Result is wrong when running Qwen2-1.5B-Instruct on Intel NPU #11795

Open grandxin opened 2 months ago

grandxin commented 2 months ago

Just follow the example https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/generate.py

when I use load_low_bit=sym_int4, the result is wrong. -------------------- Output -------------------- <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user 什么是电子竞技<|im_end|> <|im_start|>assistant League League League trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail trail

while using load_low_bit=sym_int8, result is correct. -------------------- Output -------------------- <|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user 什么是电子竞技<|im_end|> <|im_start|>assistant 电子竞技(Electronic Sports)是一种以电子游戏为比赛项目,通过网络进行的体育运动。它包括了多种类型的游戏,如《英雄联盟

plusbang commented 2 months ago

Qwen2-7B-Instruct is verified and we will try to reproduce your error first. We will inform you immediately once there is progress.

plusbang commented 2 months ago

Hi, @grandxin , I could not reproduce such error on MTL with 32.0.100.2540 driver.

By using ipex-llm==2.1.0b20240814, the output of Qwen2-1.5B-Instruct with load_low_bit=sym_int4 is

-------------------- Output --------------------
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
什么是电子竞技<|im_end|>
<|im_start|>assistant
电子竞技是一种基于游戏和视频游戏平台的体育运动,参与者使用特定的游戏控制器(如键盘、鼠标、手柄等)进行控制。与其他
--------------------------------------------------------------------------------
done

Please check the driver version and ipex-llm version.

grandxin commented 2 months ago

Hi, @grandxin , I could not reproduce such error on MTL with 32.0.100.2540 driver.

By using ipex-llm==2.1.0b20240814, the output of Qwen2-1.5B-Instruct with load_low_bit=sym_int4 is

-------------------- Output --------------------
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
什么是电子竞技<|im_end|>
<|im_start|>assistant
电子竞技是一种基于游戏和视频游戏平台的体育运动,参与者使用特定的游戏控制器(如键盘、鼠标、手柄等)进行控制。与其他
--------------------------------------------------------------------------------
done

Please check the driver version and ipex-llm version.

ok. have solved, thanks!