issues
search
layer6ai-labs
/
T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
MIT License
89
stars
11
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Question: initialization for the case of multi-head attention
#8
t-taniai
opened
2 years ago
1
Does adding layer norm together with t-fixup makes the model even better or does t-fixup make layer norm completely unnecessary (i.e. no performance gain)?
#7
yxchng
opened
3 years ago
0
T-Fixup for Language Modeling
#6
sairams-intel
closed
3 years ago
1
Details for initializing FFN (MLP blocks)?
#5
zhuchen03
closed
1 year ago
1
Is it necessary to make the encoder layers equal to decoder layers?
#4
SefaZeng
closed
1 year ago
1
gradient exploding when training deep models with FP16
#3
pluiez
closed
4 years ago
1
Possible minor typo in the paper.
#2
kdexd
closed
4 years ago
1
FP16 Training
#1
libeineu
closed
4 years ago
2