T-Fixup for Language Modeling

layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"

MIT License

89 stars 11 forks source link

T-Fixup for Language Modeling #6

Closed sairams-intel closed 3 years ago

sairams-intel commented 3 years ago

Hello, thanks for sharing the code! Have you tried the T-Fixup initialization on language modeling tasks, in transformer models like BERT? Since the decoder is absent, I was wondering if you have any suggestions on how to initialize the encoder so as to remove layer norm as shown in this paper?

gaceladri commented 3 years ago

I am looking for the same. The reason to close the issue without answering is that it is the same initialization for encoder only models or..?