Open indhub opened 2 months ago
I think this is a mistake in the paper too~
any findings?
As defined in https://github.com/meta-llama/llama3/blob/main/llama/model.py hidden_dim is initialized to 4h,and then determined by ffn_dim_multiplier and multiple_of
hidden_dim = int(2 * hidden_dim / 3)
# custom dim factor multiplier
if ffn_dim_multiplier is not None:
hidden_dim = int(ffn_dim_multiplier * hidden_dim)
hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
In the "The Llama 3 Herd of Models" paper, FFN dimension for the 8B, 70B and 405B models are stated as 6,144, 12,288 and 20,480. I would have expected the parameter count to stay the same as llama 3 where these were 14,336, 28,672 and 53,248. I downloaded the weights for the 70B model and checked - FFN dimension is indeed 28,672.
Did the paper get this wrong? Or am I reading it wrong?