pytorch / torchtitan

A native PyTorch Library for large model training
BSD 3-Clause "New" or "Revised" License
2.29k stars 170 forks source link

[checkpointing] import async checkpoint with pinned memory only when needed #333

Closed tianyu-l closed 4 months ago

tianyu-l commented 4 months ago

Stack from ghstack (oldest at bottom):

"Async checkpointing with pinned memory" requires a very recent pytorch nightly. Since we don't want to fix a nightly in requirements.txt, and this checkpointing option is not the default one, let's not import it under all circumstances.