Ensure Consistency Between GPTConfig.block_size and Sequence Length T

karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch

3.44k stars 473 forks source link

First and foremost, I want to express my appreciation for this tutorial. It's incredibly insightful and well-structured.

I'm submitting this PR because I noticed a potential issue related to GPTConfig.block_size not being enforced to match the sequence length T.

If I understand correctly, this discrepancy could lead to unexpected model behavior during inference if T is lower than GPTConfig.block_size . (Note that an assertion error is already raised when T exceeds GPTConfig.block_size, as seen here).

Thank you for considering this change. Please let me know if any further adjustments are needed.

karpathy / build-nanogpt

Ensure Consistency Between GPTConfig.block_size and Sequence Length T #72

att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))