Is the LayerNormalization class is the transformer needed?

lsdefine / attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

702 stars 188 forks source link

Closed chaitjo closed 6 years ago

chaitjo commented 6 years ago

Keras implements a BatchNormalization layer. Isn't the LayerNormalization class the same thing?

(Or is the code for a version of Keras where BN was not implemented?)

young-zonglin commented 6 years ago

They are not the same. LayerNorm paper: Layer normalization, https://arxiv.org/abs/1607.06450 BatchNorm paper: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, https://arxiv.org/abs/1502.03167

chaitjo commented 6 years ago

I see, thank you for the information!