hengruo / QANet-pytorch

A PyTorch implementation of QANet.
MIT License
344 stars 67 forks source link

the layer norm #7

Closed InitialBug closed 6 years ago

InitialBug commented 6 years ago

nn.LayerNorm is a function with learnable parameters, it not only normalize the input, but also learn the possible data distribution, I think different layers in the encoder block(eg. conv layer,self-attention layer, feed forward layer) should have different learnable layernorm.

hengruo commented 6 years ago

@InitialBug You're right. I have tested the different layernorms but the results are almost identical, so I put this change off. Anyway, thanks a lot!