Setup Environment

Firstly, make sure that everything works well in https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama. This make sure that you have solved all environment issue and you can start to convert the huggingface checkpoint into a zero enabled ckpt.

Checkpoint Conversion

The simplest idea is using the script hf2megads_weight_converter.py and disable pipeline parallel to get a Deepspeed ZeRO Checkpoint. Ah! But it can not be done when you are using this script of https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama. When you are trying to do such a thing, you will get error. https://github.com/microsoft/Megatron-DeepSpeed/blob/3afd267e1e50b1410beb606c5625cc232a55417a/tools/hf2megads_weight_converter.py#L288-L291

Then you may think universal_checkpointing technique may help you to achieve such a conversion. Ah! You wish universal_checkpointing can help you to achive conversion between ZeRO1/2/3 checkpoints with different world size and TP/PP/ZeRO1 checkpoints with different parallel size. But it can not achieve conversion between TP/PP/ZeRO1 and ZeRO2/3. So there is only one way left, to figure out how to achive a ZeRO2/3 checkpoint conversion method based on this script hf2megads_weight_converter.py.

Finetune script

After getting a ZeRO checkpoint, everything else is quite easy. But since this tutorial https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/finetune_hf_llama do not expect you will finetune llama using ZeRO and without pipeline-parallel, there is still a little effort to get there.

Detail modification , please refer to this fix-zero-load. and it should work well.

microsoft / Megatron-DeepSpeed

A tutorial to help you finetune LLama-2-7b using this repository full of garbarge code with ZeRO2/3 enabled. #430

Setup Environment

Checkpoint Conversion

Finetune script