Ascend NPU训练成功但是推理报错

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 llamafactory-cli chat /data/scripts/llama2_lora_sft.yaml

llama 2 7b model [ma-user LLaMA-Factory]$npu-smi info +------------------------------------------------------------------------------------------------+ | npu-smi 23.0.rc2 Version: 23.0.rc2 | +---------------------------+---------------+----------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | +===========================+===============+====================================================+ | 0 910B | OK | 70.8 36 0 / 0 | | 0 | 0000:C1:00.0 | 0 2187 / 15137 1 / 32768 | +===========================+===============+====================================================+ +---------------------------+---------------+----------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===========================+===============+====================================================+ | No running processes found in NPU 0 | +===========================+===============+====================================================+

Expected behavior

nfig.json [INFO|configuration_utils.py:962] 2024-05-21 12:05:29,784 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 }

05/21/2024 12:05:29 - INFO - llamafactory.model.utils.attention - Using torch SDPA for faster training and inference. 05/21/2024 12:05:29 - INFO - llamafactory.model.adapter - Adapter is not found at evaluation, load the base model. 05/21/2024 12:05:29 - INFO - llamafactory.model.loader - all params: 6738415616 Welcome to the CLI application, use clear to remove the history, use exit to exit the application.

User: xxxxxxxx Assistant: [W VariableFallbackKernel.cpp:51] Warning: CAUTION: The operator 'aten::isin.Tensor_Tensor_out' is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback) /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/logits_process.py:1591: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at torch_npu/csrc/aten/common/TensorFactories.cpp:74.) scores_processed = torch.where(scores != scores, 0.0, scores) E39999: Inner Error! E39999: 2024-05-21-12:06:49.978.985 An exception occurred during AICPU execution, stream_id:56, task_id:3319, errcode:21008, msg:inner error[FUNC:ProcessAicpuErrorInfo][FILE:device_error_proc.cc][LINE:730] TraceBack (most recent call last): Kernel task happen error, retCode=0x2a, [aicpu exception].[FUNC:PreCheckTaskErr][FILE:task_info.cc][LINE:1776] Aicpu kernel execute failed, device_id=0, stream_id=56, task_id=3319, errorCode=2a.[FUNC:PrintAicpuErrorInfo][FILE:task_info.cc][LINE:1579] Aicpu kernel execute failed, device_id=0, stream_id=56, task_id=3319, fault op_name=[FUNC:GetError][FILE:stream.cc][LINE:1512] rtStreamSynchronize execute failed, reason=[aicpu exception][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53] synchronize stream failed, runtime result = 507018[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

DEVICE[0] PID[308274]: EXCEPTION TASK: Exception info:TGID=2533077, model id=65535, stream id=56, stream phase=3, task id=3319, task type=aicpu kernel, recently received task id=3323, recently send task id=3318, task phase=RUN Message info[0]:aicpu=0,slot_id=0,report_mailbox_flag=0x5a5a5a5a,state=0x5210 Other info[0]:time=2024-05-21-12:06:49.250.837, function=proc_aicpu_task_done, line=970, error code=0x2a Exception in thread Thread-8: Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/threading.py", line 917, in run self._target(*self._args, *self._kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 1736, in generate result = self._sample( File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/generation/utils.py", line 2426, in _sample streamer.put(next_tokens.cpu()) RuntimeError: ACL stream synchronize failed, error code:507018

System Info

No response

Others

No response

hiyouga / LLaMA-Factory