mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

transfomer model with different paramters #23

Closed ChuanyangZheng closed 3 years ago

ChuanyangZheng commented 3 years ago

Hello, I am confused in your results on WMT’14 En-De and WMT’14 En-Fr: I wonder how you get transformer proposed by Vaswaniet al. (2017) for WMT with different paramters such as 2.8M, 5.7M, by pruning I guess?

Michaelvll commented 3 years ago

Thank you for asking! As we mentioned in the paper, we omit the word embedding lookup table from the model parameters. : )

ChuanyangZheng commented 3 years ago

Thank you very much for you kind reply. However, you might get my point. I wonder how you compress the original Transformer into different model size in Table 1. For example, the smallest 2.8M transfomer is much samller than original Transfomer size 45M(not counting word embedding).

Michaelvll commented 3 years ago

Thank you for asking! As we mentioned in the paper, we shrink the embedding size of the model to reduce the number of parameters, following the settings in the evolved transformer.