rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
Other
34.28k stars 4.2k forks source link

LLama 3.2 1B model #387

Closed d-kleine closed 1 month ago

d-kleine commented 1 month ago

Bug description

Hi Sebastian,

About converting-llama2-to-llama3.ipynb, I have found a inconsistency in the figure for the LLama 3.2 1B model:

LLAMA32_CONFIG_1B = {
    "vocab_size": 128_256,    # Vocabulary size
    "context_length": 8192,   # Context length
    "emb_dim": 2048,          # NEW: Half the embedding dimension
    "n_heads": 32,            # Number of attention heads
    "n_layers": 16,           # NEW: Half the number of layers
    "hidden_dim": 8192,      # NEW: Almopst half the size of the intermediate dimension in FeedForward
    "n_kv_groups": 8,         # Key-Value groups for grouped-query attention
    "rope_base": 50_000,      # The base in RoPE's "theta"
    "dtype": torch.bfloat16,  # Lower-precision dtype to save memory
    "rope_freq": {            # RoPE frequency scaling
        "factor": 32.0,       # NEW: Adjustment of the rescaling factor
        "low_freq_factor": 1.0,
        "high_freq_factor": 4.0,
        "original_context_length": 8192,
    }
}

I believe in the figure it should be "Embedding dimension of 2048" for the LLama 3.2 1B model figure.


About the figure and the code, isn't also the context length for both Llama 3.1 8b as well as LLama 3.2 1b much larger?

https://huggingface.co/meta-llama/Llama-3.2-1B -> A large context length of 128K tokens (vs original 8K)

rasbt commented 1 month ago

Thanks for the note! The 2048 was an issue where I forgot to push the updated figures to the server, and the 8k token one was an absolute oversight. You are right, the model supports more tokens. 131k actually. I updated it via #389.

d-kleine commented 1 month ago

Thanks!

d-kleine commented 1 month ago

@rasbt Could you please also update these figures?

On the right, the context length for Llama 3.1 here (128k)

doesn't match with the one here on the left side: