Ascend卡上无法训练deepseek模型是否支持呢

hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

https://arxiv.org/abs/2403.13372

Apache License 2.0

25.98k stars 3.23k forks source link

Ascend卡上无法训练deepseek模型是否支持呢 #4361

Open sweetning0809 opened 3 weeks ago

sweetning0809 commented 3 weeks ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

在npu上训练deepseek系列模型，需要flash attn库但是因为冲突 npu无法使用该库导致无法训练请问是否考虑支持呢可能需要把向前推理换成flashattn算子：https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html

Reproduction

llamafactory cli train

Expected behavior

希望可以支持npu训练deepseek

Others

No response

sweetning0809 commented 3 weeks ago

研究了一下代码主要涉及的部分是longlora.py中的LlamaFlashAttention2和LlamaSdpaAttention 可能需要按照https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/performance_tuning_0027.html 将transformer中的LlamaFlashAttention2文件第516和531行做替换为文档中的torch_npu.npu_fusion_attention

Sdpa同理可能可以支持一下

sweetning0809 commented 3 weeks ago

同时需要更改模型中的modeling_deepseek.py

sweetning0809 commented 3 weeks ago

主要需要gpu2npu模型迁移 https://www.hiascend.com/document/detail/zh/Pytorch/60RC1/ptmoddevg/trainingmigrguide/PT_LMTMOG_0016.html

hiyouga / LLaMA-Factory

Ascend卡上无法训练deepseek模型 是否支持呢 #4361

Reminder

System Info

Reproduction

Expected behavior

Others

Ascend卡上无法训练deepseek模型是否支持呢 #4361