Open McKracken opened 5 years ago
CyberZHG's
variance = K.mean(K.square(inputs - mean), axis=-1, keepdims=True)
std = K.sqrt(variance + self.epsilon)
My
std = K.std(x, axis=-1, keepdims=True)
I think maybe there are input sequences with length 0, and the whole sequence is mask. But you can safely use his LayerNormalization.
Hi,
I was using only the LayerNormalization from your code in mine. I didn't change anything from the code, apart from overriding the
compute_mask
function, as my input is an Embedding withmask_zero=True
Code
but strangely I get all
nan
for all the measurements I do while training and tuning (loss function and others). I tried using other implementations of the LayerNormalization layer (e.g. https://github.com/CyberZHG/keras-layer-normalization), and everything works without problem. I was wondering whether you have any clue about that.