microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.24k stars 88 forks source link

Allowing users to create their own shapes #37

Closed TevenLeScao closed 1 year ago

TevenLeScao commented 1 year ago

This feature is needed for users that want to define the get_shapes function themselves, for example in tensor-parallel environments where each process only sees a fraction of the parameters.