Open NTT123 opened 2 weeks ago
We are using a scale factor of 8 in the reference implementation, which seems to match with Hugging Face config files for 3.1 models.
8
However, I observed that the new 3.2 models use a scale factor of 32 (https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/config.json#L23). I wonder if this can cause any potential issues? https://github.com/meta-llama/llama-models/blob/4269717b2ea587627903bacbb75ccce1427ad914/models/llama3/reference_impl/model.py#L47
32
+1
After running some long context eval, 32 seems to be the correct scale factor @NTT123
We are using a scale factor of
8
in the reference implementation, which seems to match with Hugging Face config files for 3.1 models.However, I observed that the new 3.2 models use a scale factor of
32
(https://huggingface.co/meta-llama/Llama-3.2-1B/blob/main/config.json#L23). I wonder if this can cause any potential issues? https://github.com/meta-llama/llama-models/blob/4269717b2ea587627903bacbb75ccce1427ad914/models/llama3/reference_impl/model.py#L47