Closed erjiaxiao closed 2 years ago
the pic in README.md about LN is like in my understanding, I guess maybe LN should cover a whole layer instead of just a line of a layer? am i wrong somewhere?
Have a look at "Leveraging Batch Normalization for Vision Transformers" It explains the differences between norm layers:
the pic in README.md about LN is like in my understanding, I guess maybe LN should cover a whole layer instead of just a line of a layer? am i wrong somewhere?