majumderb rezero issues - Githubissues

majumderb / rezero

Official PyTorch Repo for "ReZero is All You Need: Fast Convergence at Large Depth"

https://arxiv.org/pdf/2003.04887.pdf

MIT License

407 stars 52 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Bump torch from 1.4.0 to 2.2.0

#19 dependabot[bot] opened 3 months ago
0
Add batch_first, dtype, device arguments

#18 lericson opened 1 year ago
0
Learning rate of the Param resweight

#17 Polarisjame opened 2 years ago
0
resweight is almost 0

#16 burcehan opened 2 years ago
1
Is ReZero applicable to fine-tuning?

#15 encounter1997 opened 3 years ago
0
weight decay for the resweight?

#14 Kyeongpil opened 3 years ago
2
Can you relaese the code for ResNet-56 in Table2 ?

#13 cuge1995 opened 4 years ago
0
The description of RZTXDecoderLayer is the same as EncoderLayer

#12 jiang-yuan closed 2 years ago
0
Sry guys but your paper is not worth more than zero :)

#11 AmorfEvo closed 4 years ago
1
can rezero be applied to cnn ?

#10 carr123 closed 4 years ago
1
The order of dropout and *resweight

#9 OneDirection9 closed 4 years ago
3
when apply rezero to bert or gpt, get NAN gradients

#8 yyht opened 4 years ago
5
Does it work in not so deep architectures?

#7 wotulong closed 4 years ago
3
Relationship between ReZero and Zero gamma trick

#6 hukkai closed 4 years ago
2
does rezero work in machine translation tasks?

#5 zherowolf closed 4 years ago
3
rezero with norm

#4 GallonDeng closed 4 years ago
1
I don't see any other application other than NLP?

#3 nile649 closed 4 years ago
1
Update README.md : add BiBTex header

#2 mpariente closed 4 years ago
0
Can the method be applied to CNN?

#1 JunMa11 closed 4 years ago
1