modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.41k stars 292 forks source link

NPU qwen2模型推理报错 #1951

Open JiayuQiao opened 1 week ago

JiayuQiao commented 1 week ago

报错描述

使用swift infer命令,do_sample=True时报错,do_sample=False时可以推理但生成结果乱码

环境

推理模型

Qwen2-7B-Instruct

报错内容

EZ9999: Inner Error! EZ9999 Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1677] TraceBack (most recent call last): AICPU Kernel task happen error, retCode=0x2a.[FUNC:GetError][FILE:stream.cc][LINE:1454] Aicpu kernel execute failed, device_id=0, stream_id=28, task_id=1726, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1522] Aicpu kernel execute failed, device_id=0, stream_id=28, task_id=1726, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1454] rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:50] synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

Exception in thread Thread-6: Traceback (most recent call last): File "/mnt/dsep/python/venv/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/mnt/dsep/python/venv/lib/python3.9/threading.py", line 917, in run self._target(*self._args, self._kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/swift/llm/utils/utils.py", line 694, in _model_generate return model.generate(*args, *kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/mnt/dsep/python/venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1525, in generate return self.sample( File "/mnt/dsep/python/venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2669, in sample streamer.put(next_tokens.cpu()) RuntimeError: ACL stream synchronize failed, error code:507018

JiayuQiao commented 1 week ago

又测试了一下,Qwen2-Instruct系列只有0.5B模型能正常推理,其他模型都不可以,报错内容和7B模型相同。