kolloldas / torchnlp

Easy to use NLP library built on PyTorch and TorchText
Apache License 2.0
254 stars 44 forks source link

Add & Norm #11

Open xuwenshen opened 5 years ago

xuwenshen commented 5 years ago

normalization seems different from the paper #attention is all you need#

in paper, normalization layer stays after mha and feed forward layer, in torchnlp, it stays before them

    x = inputs

    # Layer Normalization
    x_norm = self.layer_norm_mha(x)

    # Multi-head attention
    y = self.multi_head_attention(x_norm, x_norm, x_norm)

    # Dropout and residual
    x = self.dropout(x + y)

    # Layer Normalization
    x_norm = self.layer_norm_ffn(x)

    # Positionwise Feedforward
    y = self.positionwise_feed_forward(x_norm)

    # Dropout and residual
    y = self.dropout(x + y)
kolloldas commented 5 years ago

Yes it's from the updated Transformer model. You can find the Tensorflow version maintained by the Authors here