FFN dimension in "The Llama 3 Herd of Models" paper

meta-llama / llama3

The official Meta Llama 3 GitHub site

Other

26.73k stars 3.03k forks source link

FFN dimension in "The Llama 3 Herd of Models" paper #283

Open indhub opened 2 months ago

indhub commented 2 months ago

In the "The Llama 3 Herd of Models" paper, FFN dimension for the 8B, 70B and 405B models are stated as 6,144, 12,288 and 20,480. I would have expected the parameter count to stay the same as llama 3 where these were 14,336, 28,672 and 53,248. I downloaded the weights for the 70B model and checked - FFN dimension is indeed 28,672.

Did the paper get this wrong? Or am I reading it wrong?

ykddd commented 2 months ago

I think this is a mistake in the paper too~

YanJiaHuan commented 2 months ago

any findings?

ykddd commented 2 months ago

As defined in https://github.com/meta-llama/llama3/blob/main/llama/model.py hidden_dim is initialized to 4h,and then determined by ffn_dim_multiplier and multiple_of

hidden_dim = int(2 * hidden_dim / 3)
# custom dim factor multiplier
if ffn_dim_multiplier is not None:
    hidden_dim = int(ffn_dim_multiplier * hidden_dim)
hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)