Closed wangbing35 closed 2 months ago
使用昇腾卡训练大参数量模型,deepspeed stage 3 + offload模式下,提示RuntimeError: inplace tensor self must be NPU-Tensor
ds_z3_offload_config.json
No response
https://github.com/microsoft/DeepSpeed/issues/5585 等 deepspeed 下一个 release,或者自己用 master 代码编译 deepspeed 。
deepspeed升级到0.14.3,支持offload了
Reminder
System Info
使用昇腾卡训练大参数量模型,deepspeed stage 3 + offload模式下,提示RuntimeError: inplace tensor self must be NPU-Tensor
Reproduction
ds_z3_offload_config.json
Expected behavior
No response
Others
No response