mindspore-lab / mindnlp

Easy-to-use and high-performance NLP and LLM framework based on MindSpore, compatible with models and datasets of 🤗Huggingface.
https://mindnlp.cqu.ai/
Apache License 2.0
702 stars 197 forks source link

kernel error和Ascend error #1698

Closed CMJ7733 closed 3 weeks ago

CMJ7733 commented 1 month ago

Describe the bug/ 问题描述 (Mandatory / 必填) Launch kernel failed: Bprop/gradDense/Reshape-op278

The error from device(chipId:0, dieId:0), serial number is 13, an exception occurred during AICPU execution, stream_id:2, task_id:19016, errcode:21008, msg:inner error.[FUNC:ProcessStarsAicpuErrorInfo][FILE:device_error_proc.cc][LINE:1232]

mindspore/ccsrc/runtime/graph_scheduler/actor/kernel_actor.cc:917 ExecuteLaunchKernelTask

device ascend

/mode graph

To Reproduce / 重现步骤 (Mandatory / 必填) lora_seq2seq文件里训练评估部分的代码运行会报错

Expected behavior / 预期结果 (Mandatory / 必填) 解决error

Screenshots/ 日志 / 截图 (Mandatory / 必填) 69c06c89694a301fd921f07855294da

Additional context / 备注 (Optional / 选填) Add any other context about the problem here.

lvyufeng commented 1 month ago

910B还是910A

lvyufeng commented 1 month ago

提供下代码吧