wangqiangneu / MT-PaperReading

Record my paper reading about Machine Translation and other related works.
36 stars 2 forks source link

19-NIPS-Root Mean Square Layer Normalization #37

Open wangqiangneu opened 4 years ago

wangqiangneu commented 4 years ago

简介

和#36类似,也是对现有layer normalization的改进 + 一大堆任务宣称有效的套路。实际做的是将$LayerNorm(x) = g\frac{x-u}{v}+b, u=mean(x), v=std(x)$ 改为 $RMSNorm(x)=g\frac{x}{rms(x)}+b, rms(x)=\sqrt{\frac{1}{N}*\sum{x_i^2}}$。也就是把layer normalization里的跟mean相关的项给去掉了,文章也宣称这种re-center的特性实际是没啥用的。而使用RMS的方式,能够保留re-scale的特性

论文信息

总结