meta-llama / llama

Inference code for Llama models
Other
54.12k stars 9.32k forks source link

parameter count of Llama2-70B and Llama2-13B #1111

Open joyjitkundu032 opened 2 months ago

joyjitkundu032 commented 2 months ago

Hi All,

I am struggling to get a count of 70B parameters for Llama2-70B model. Here is my calculation:

Attention parameters per layer: 4 x 8192 x 8192 MLP parameters per layer (gate, up and down projection): 3 x 8192 x 28672 80 layers, vocab size 32000 (embedding dim 8192)

Total parameters ~ 80 x (4 x 8192 x 8192 + 3 x 8192 x 28672) + 32000 x 8192 ~ 78B Where am I getting it wrong?

I do get correct count for 13B: Total parameters ~ 40 x (4 x 5120 x 5120 + 3 x 5120 x 13824) + 32000 x 5120 ~ 12.7B

Is it because of grouped query for 70B model?