microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 345 forks source link

Support universal checkpoint for GPTModel #361

Closed mosheisland closed 8 months ago

mosheisland commented 8 months ago

Save to checkpoints the required universal patterns for GPTModel.

Additionally, unify the logic of universal checkpoint info for both GPTModel and GPTModelPipe under a new class: UniversalCheckpointInfo.