layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
MIT License
89 stars 11 forks source link

Possible minor typo in the paper. #2

Closed kdexd closed 4 years ago

kdexd commented 4 years ago

Hi, thanks for releasing the code, and great paper with strong results! This is not directly related to the codebase — I spotted a possible typo in the paper (ICML 2020 pre-proceedings). Equation 3 here should have a + sign in the braces since (- $\eta$) is taken common outside?

image

Please correct me if I am wrong, or missing something here.

risingdhxs commented 4 years ago

Yes, thank you for pointing it out. We will correct it in the camera ready version.