volcengine / veScale

A PyTorch Native LLM Training Framework
http://vescale.xyz
Apache License 2.0
553 stars 26 forks source link

[QUESTION] Save checkpoint #26

Closed Ryanuppp closed 4 months ago

Ryanuppp commented 4 months ago

Is there any method to save the checkpoint during the training with vescale?

leonardo0lyj commented 4 months ago

@Ryanuppp great question! checkpoint is open source now: https://github.com/volcengine/veScale/pull/27