Closed callumm-graphcore closed 1 year ago
Hi Callum,
Yes, it's possible to have parameters with only finite dimensions. For example, given a finite output dimension d_out, the bias vector for the last layer will have dimension 1 x d_out.
Thanks Edward! Is there a part of the paper that explains what the correct scaling is in this case? Would this apply even if you had a linear layer where neither the input nor the output dimension was scaled?
The bias example I gave is covered under input weights & biases
in Table 3, 8, and 9, and it has a constant init and LR.
Yes, it also applies when you have a linear layer. We might not have talked about it specifically in the paper since it's less common, but you should use a constant init and LR.
Ah, OK, I see now. Thank you very much!
Yes that is allowed.
On Mon, Nov 28, 2022, 9:26 AM Callum @.***> wrote:
Ah, OK, I see now. Thank you very much!
— Reply to this email directly, view it on GitHub https://github.com/microsoft/mup/issues/29#issuecomment-1329294829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWHHM6NGFEYMSIZ5QOAHRDWKTFK5ANCNFSM6AAAAAASNLDV2I . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi,
Is it valid to have parameters that have no "infinite" dimensions? This line suggests that it is, but I can't find anything in the paper that explains how this case should be dealt with.
With thanks, Callum