jianzhnie / LLamaTuner

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
https://jianzhnie.github.io/llmtech/
Apache License 2.0
547 stars 60 forks source link

Fix the checkpoint saving issues when zero3 is enabled #99

Closed lyxok1 closed 1 month ago

lyxok1 commented 7 months ago

Hi, I change the ckpt saving behavior by calling the save_model interface of official Trainer from hf-transformers, since this function will handle the case under different training framework including deepspeed zero3, where the model will be saved as pytorch.bin model for deepspeed (in old version of transformers) or collected state_dict (in new version of transformers)