microsoft / mup

maximal update parametrization (µP)
https://arxiv.org/abs/2203.03466
MIT License
1.24k stars 88 forks source link

Unclear `assert_hidden_size_inf` triggers #62

Closed dreavjr closed 8 months ago

dreavjr commented 9 months ago

My code is triggering the "has infinite fan-in and finite fan-out dimensions but is not type MuReadout" assertion on "non-obvious" situations (not the last linear layer of the model):

What am I doing wrong? Is there a good way to debug those situations?

dreavjr commented 8 months ago

Okay, I think I finally got it!

I cannot simply apply mup to the individual parameters of a vanilla model/layer/block and expect it to work every time -> sometimes the model/layer/block has to be reparameterized. In particular, all layers in an mlp-like block have to grow or shrink in tandem, except, possibly by the output layer of the model.

I am closing this for now.