Closed xrsrke closed 5 months ago
An implementation of A Spectral Condition for Feature Learning (a follow up work from greg yang's µTransfer)
https://arxiv.org/abs/2310.17813
Readme: https://github.com/huggingface/nanotron/tree/xrsrke/mu_transfer/examples/mup
How to run it?
CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/mup/confis/mup_350m_llama_config.yaml
add Readme to describe what mu-transfer does and show some curves
Ready for review!
Make sure tests pass before merging 🙏
An implementation of A Spectral Condition for Feature Learning (a follow up work from greg yang's µTransfer)
https://arxiv.org/abs/2310.17813
Readme: https://github.com/huggingface/nanotron/tree/xrsrke/mu_transfer/examples/mup
How to run it?