huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Some sanity fix for "PR [Feature] Topology-agnostic optimizer states loading" #29

Closed xrsrke closed 8 months ago