huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Implement pipeline parallel size-agnostic optimizer state loading #71

Closed nopperl closed 7 months ago

nopperl commented 7 months ago

I tried implementing the topology-agnostic optimizer state loading for the pipeline parallel dimension ( #38 ). This also fixes the issue I had in #68 .