Closed jasperzhong closed 4 years ago
图不错,收了
看了半天怎么觉得和自己理解的BN LN不太一样,原来BN对CNN有特殊的修改。。。
详见https://stackoverflow.com/a/46692217/9601110
本来是对B一个维度算个mean/var
但是作者希望不同位置的element能够以相同的mean/var做normalize,这样比较符合CNN的性质
For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a mini- batch, over all locations.
CNN要对BHW算个mean/var,把BHW合起来算一个effective mini-batch
这样算出来其实只有C个mean/var,而不是CHW个
nb
LN也一样,对CHW算mean/var,这样有B个mean/var,把CHW合起来算effective hidden size
IN对HW算mean/var,这样有CB个mean/var,
GN也是对(C/G)HW算mean/var,一共CB/G个mean/var. G=1的话就是LN,G=C就是instance norm
实现也很简单
https://arxiv.org/abs/1803.08494
from: zihao