Closed songyinys closed 1 year ago
Dear Song,
I'm happy to hear that you successfully ran our project. When you open the model
directory, you will find a zero_to_fp32.py
script. This script extracts fp32 consolidated weights from zero stage 2 and 3 DeepSpeed checkpoints. You can run this script, which will automatically generate a model file according to your zero stage during finetuning. For example,
> python zero_to_fp32.py "../model" "global_step0/0model.pkl"
Furthermore, when you use multiple GPUs to finetune the model, you will get multiple checkpoint files starting with zero_pp_rank
, the number of which is the same as the number of GPUs. At this point, if you want to further use the model for inference or other tasks, you need to use zero_to_fp32.py
to merge these files into a usable single file. The details involve model saving in DeepSpeed. You can refer to https://github.com/microsoft/DeepSpeed for more information.
Best, Yin Fang
Hi,
After I tried preprocessing and fine-tuning, I found that two checkpoint files were generated, one is called 'mp_rank_00_model_states.pt' and the other one is called 'zero_pp_rank_0_mp_rank_00_optim_states.pt'. It seems that both of them could be used for the following generative task as the checkpoint_path, and give me similar generative results. It would be highly appreciated if more details or explanations could be given.
Many thanks in advance!
Song