mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention
https://arxiv.org/abs/2004.11886
Other
596 stars 81 forks source link

Model size confuse #12

Closed zml24 closed 4 years ago

zml24 commented 4 years ago

Hello, I read your paper and found that the smallest model size is 2.8M. However, I ran your config with the smallest embedding size (160) and found that the model size is about 5.2M. The embedding part model size is 8848 160 + 2 6632 * 160 = 3537920 ≈ 3.5M. So, how can I get the number 2.8M?

image
Michaelvll commented 4 years ago

Hi, thank you for asking! In the reported model size, we did not include the word embeddings. You can get the number by subtracting the number of parameters by the word embedding layer.