pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
1.28k stars 115 forks source link

keep only latest k checkpoints #366

Closed liangluofb closed 1 month ago

liangluofb commented 1 month ago

Adds a config that purges old checkpoints. Useful for pretraining with frequent checkpointing and large step counts.