xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
77 stars 17 forks source link

Checkpointing #24

Closed xrsrke closed 11 months ago

xrsrke commented 11 months ago

Save model during training.

ChufanSuki commented 11 months ago

I think I'll try to do this in this weekend. Do we follow the API with olso or do something else?

xrsrke commented 11 months ago

Do we follow the API with olso

We don't.

@ChufanSuki Thank you. I think I already wrote this in this link. It seems like duplication. But we need someone to convert the format to the Hugging Face format. Would you like to work on that

ChufanSuki commented 11 months ago

Sure. I’m not sure do we need convert the format to the 🤗 format since pipegoose already use 🤗 transformer.

xrsrke commented 11 months ago

@ChufanSuki I move the convo of pushing a parallelized model to the correspond issue. This one is duplication. Please check it here [link]. Thank you.