Fix the checkpoint saving issues when zero3 is enabled

jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

https://jianzhnie.github.io/llmtech/

Apache License 2.0

576 stars 63 forks source link

Fix the checkpoint saving issues when zero3 is enabled #99

Closed lyxok1 closed 5 months ago

lyxok1 commented 11 months ago

Hi, I change the ckpt saving behavior by calling the save_model interface of official Trainer from hf-transformers, since this function will handle the case under different training framework including deepspeed zero3, where the model will be saved as pytorch.bin model for deepspeed (in old version of transformers) or collected state_dict (in new version of transformers)