youngtboy commented 2 years ago

This paper provides new perspectives about Transformer block, but I have some questions about one of the details. As far as I know, the LayerNorm officially provided by Pytorch implements the same function as the MLN, which computes the mean and variance along token and channel dimensions. So where is the improvement? The official example :

Image Example

N, C, H, W = 20, 5, 10, 10 input = torch.randn(N, C, H, W)

Normalize over the last three dimensions (i.e. the channel and spatial dimensions)

as shown in the image below

layer_norm = nn.LayerNorm([C, H, W]) output = layer_norm(input)

yuweihao commented 2 years ago

Hi @youngtboy ,

Thanks for your attention. Please refer to this issue #9 .

youngtboy commented 2 years ago

Hi @youngtboy ,

Thanks for your attention. Please refer to this issue #9 .

Thanks for your explanation!

yuweihao commented 2 years ago

Duplicate of #9

sail-sg / poolformer

About MLN（Modified Layer Normalization） #40

Image Example

Normalize over the last three dimensions (i.e. the channel and spatial dimensions)

as shown in the image below