How to use Deepspeed checkpoints for BingBertSquad finetuning?

microsoft / DeepSpeedExamples

Example models using DeepSpeed

Apache License 2.0

6.02k stars 1.02k forks source link

How to use Deepspeed checkpoints for BingBertSquad finetuning? #129

Open xycforgithub opened 3 years ago

xycforgithub commented 3 years ago

Hi there, The tutorial https://www.deepspeed.ai/tutorials/bert-finetuning/#loading-huggingface-and-tensorflow-pretrained-models makes clear how to load HF and TF checkpoints into Deepspeed. What if we want to load a Deepspeed checkpoint, like from the Bing BERT example? Is it that we load the "mp_rank_00_model_states.pt" file in the checkpoint?

I'm currently using fp16 and ZERO-2, so I wonder if using that will lose some precision. Should I use zero_to_fp32 to convert the checkpoint to fp32 for loading?

1024er commented 2 years ago

Hi there, The tutorial https://www.deepspeed.ai/tutorials/bert-finetuning/#loading-huggingface-and-tensorflow-pretrained-models makes clear how to load HF and TF checkpoints into Deepspeed. What if we want to load a Deepspeed checkpoint, like from the Bing BERT example? Is it that we load the "mp_rank_00_model_states.pt" file in the checkpoint?

I'm currently using fp16 and ZERO-2, so I wonder if using that will lose some precision. Should I use zero_to_fp32 to convert the checkpoint to fp32 for loading?

Is there an answer to this question? Thankyou

tjruwase commented 2 years ago

@xycforgithub and @1024er, apologies for the late response.

Yes, please use zero_to_fp32 to convert deepspeed checkpoint to a standard pytorch fp32 weights checkpoint.