Open marcobellagente93 opened 8 months ago
Adding the width_mult key to the MuAdam state dictionary to make it more easy to use the class, e.g. to enable its correct use in https://github.com/EleutherAI/gpt-neox
width_mult
Adding the
width_mult
key to the MuAdam state dictionary to make it more easy to use the class, e.g. to enable its correct use in https://github.com/EleutherAI/gpt-neox