layer6ai-labs / T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
MIT License
89 stars 11 forks source link