rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
Other
34.13k stars 4.18k forks source link

minor fixes: Llama 3.2 standalone #420

Closed d-kleine closed 1 month ago

d-kleine commented 1 month ago
review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

d-kleine commented 1 month ago

If you find some spare time, it would be great if you could implement these formatting changes to this figure:

Bold print seems to indicate what has been changed from before, so

I really enjoy these notebooks and figures around GPT-2 and the LLama 2/3 models here - it's a great round-up of the contents of the book both technically (code) and visually (figures)!

rasbt commented 1 month ago

Good catch regarding the 72. I also reformatted the RoPE base as float to make it consistent with the other RoPE float settings. (I synced the figure, but it may be a few hours until the change takes effect due to GitHub's caching.)

d-kleine commented 1 month ago

@rasbt Thanks! I just took a look into the updated figure, the "32 heads" of Llama 3 8B are still bold print. grafik

Also, I have seen another information in the figure that might need an update:

You could also add the information that Llama 2 already used GQA for the larger models (34b and 70b) for improved inferencescalability. I think this an interesting information for the figure.

rasbt commented 1 month ago

Thanks, I will try to update it in the next few days!

rasbt commented 1 month ago

Looks like I had fixed the "heads" in the Llama figure but then forgot to apply it to some of the figures where it's used as subfigure. Good call regarding the RoPE btw. Should be taken care of now!

d-kleine commented 1 month ago

Looks great, thanks! Superb comprehensive overview btw!