layer6ai-labs T-Fixup issues - Githubissues

layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"

MIT License

89 stars 11 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Question: initialization for the case of multi-head attention

#8 t-taniai opened 2 years ago
1
Does adding layer norm together with t-fixup makes the model even better or does t-fixup make layer norm completely unnecessary (i.e. no performance gain)?

#7 yxchng opened 3 years ago
0
T-Fixup for Language Modeling

#6 sairams-intel closed 3 years ago
1
Details for initializing FFN (MLP blocks)?

#5 zhuchen03 closed 1 year ago
1
Is it necessary to make the encoder layers equal to decoder layers?

#4 SefaZeng closed 1 year ago
1
gradient exploding when training deep models with FP16

#3 pluiez closed 4 years ago
1
Possible minor typo in the paper.

#2 kdexd closed 4 years ago
1
FP16 Training

#1 libeineu closed 4 years ago
2