issues
search
huggingface
/
nanotron
Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k
stars
107
forks
source link
[Refactor] Add support to resume training using optimizer states with different topology
#19
Closed
NouamaneTazi
closed
8 months ago
NouamaneTazi
commented
8 months ago
add support for topology agnostic optimizer states loading (so that we can resume training using optim states with different DP for example)