huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

[Feature] Spectral µTransfer #123

Closed xrsrke closed 5 months ago

xrsrke commented 6 months ago

An implementation of A Spectral Condition for Feature Learning (a follow up work from greg yang's µTransfer)

https://arxiv.org/abs/2310.17813

Readme: https://github.com/huggingface/nanotron/tree/xrsrke/mu_transfer/examples/mup

How to run it?

CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/mup/confis/mup_350m_llama_config.yaml
3outeille commented 5 months ago

add Readme to describe what mu-transfer does and show some curves

xrsrke commented 5 months ago

Ready for review!

NouamaneTazi commented 5 months ago

Make sure tests pass before merging 🙏