sIncerass / powernorm

[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
GNU General Public License v3.0
119 stars 17 forks source link

Comparisons with RMSNorm? #14

Closed xiaoxin83121 closed 1 year ago

xiaoxin83121 commented 1 year ago

Hi, I have seen LLAMA using RMSNorm in a pre-norm manner. However, I have read your paper a long time ago. And I realize that the forward propagations are nearly the same. Am I mistaking something?