xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Save and load checkpoints #29

Open xrsrke opened 10 months ago

xrsrke commented 10 months ago

Notes

APIs

# save checkpoints of a parallelized model
model.save_pretrained(
    save_directory="./checkpoints",
    save_config=True, # default
    save_function=torch.save, # default
    merge_checkpoints=True, # False by default
)

# load checkpoints from a parallelized model
model.from_parallelized(path="./checkpoints")