huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

[Feature] DoReMi #54

Closed xrsrke closed 7 months ago

xrsrke commented 8 months ago

README: https://github.com/huggingface/nanotron/blob/xrsrke/feature_doremi_new_codebase/examples/doremi/README.md

xrsrke commented 7 months ago

add codespell:

image

bring back isort:

image