microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.24k stars 88 forks source link

add width_mult to optimizer dict #65

Open marcobellagente93 opened 8 months ago

marcobellagente93 commented 8 months ago

Adding the width_mult key to the MuAdam state dictionary to make it more easy to use the class, e.g. to enable its correct use in https://github.com/EleutherAI/gpt-neox