huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

Fused Layer Norm #5

Closed xrsrke closed 8 months ago

xrsrke commented 9 months ago

TODO

xrsrke commented 9 months ago

@NouamaneTazi

Both done. I ran both fused layer norm and fast layer norm for 10k update steps. And here are the convergence results.

image

image

xrsrke commented 8 months ago

just reran it again after the new changes

image