hiyouga / LLaMA-Factory

Unify Efficient Fine-Tuning of 100+ LLMs
Apache License 2.0
25.52k stars 3.16k forks source link

deepspeed ds_z3_offload_config单卡全量微调训练glm4出现exits with return code = -9。出现该问题时,CPU内存(252G)占满,想问一下这个问题该如何解决? #4629

Closed ldknight closed 2 days ago

ldknight commented 2 days ago

Reminder

System Info

版本更新日期0526

Reproduction

deepspeed ds_z3_offload_config单卡全量微调训练glm4

Expected behavior

No response

Others

No response

hiyouga commented 2 days ago

增加系统内存