issues
search
majumderb
/
rezero
Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"
https://arxiv.org/pdf/2003.04887.pdf
MIT License
407
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bump torch from 1.4.0 to 2.2.0
#19
dependabot[bot]
opened
3 months ago
0
Add batch_first, dtype, device arguments
#18
lericson
opened
1 year ago
0
Learning rate of the Param resweight
#17
Polarisjame
opened
2 years ago
0
resweight is almost 0
#16
burcehan
opened
2 years ago
1
Is ReZero applicable to fine-tuning?
#15
encounter1997
opened
3 years ago
0
weight decay for the resweight?
#14
Kyeongpil
opened
3 years ago
2
Can you relaese the code for ResNet-56 in Table2 ?
#13
cuge1995
opened
4 years ago
0
The description of RZTXDecoderLayer is the same as EncoderLayer
#12
jiang-yuan
closed
2 years ago
0
Sry guys but your paper is not worth more than zero :)
#11
AmorfEvo
closed
4 years ago
1
can rezero be applied to cnn ?
#10
carr123
closed
4 years ago
1
The order of dropout and *resweight
#9
OneDirection9
closed
4 years ago
3
when apply rezero to bert or gpt, get NAN gradients
#8
yyht
opened
4 years ago
5
Does it work in not so deep architectures?
#7
wotulong
closed
4 years ago
3
Relationship between ReZero and Zero gamma trick
#6
hukkai
closed
4 years ago
2
does rezero work in machine translation tasks?
#5
zherowolf
closed
4 years ago
3
rezero with norm
#4
GallonDeng
closed
4 years ago
1
I don't see any other application other than NLP?
#3
nile649
closed
4 years ago
1
Update README.md : add BiBTex header
#2
mpariente
closed
4 years ago
0
Can the method be applied to CNN?
#1
JunMa11
closed
4 years ago
1