Closed richardbaihe closed 3 years ago
This paper investigates the depth and width of the Transformer: when the number of paramerters is large, depth is more important, while for a smaller model, width is more important.
This paper investigates the depth and width of the Transformer: when the number of paramerters is large, depth is more important, while for a smaller model, width is more important.