Wrong definition of Query, Key, Value matrices? They shouldn't have bias=True

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

MIT License

19.49k stars 2.42k forks source link

Wrong definition of Query, Key, Value matrices? They shouldn't have bias=True #93

Open LeoPerelli opened 1 year ago

LeoPerelli commented 1 year ago

Hi @karpathy and thanks for your work!

I noticed that in the definition of the self-attention matrices you use a Linear layer which has bias, while I don't expect it to be there as we only want the matrix. I am talking about: self.c_attn = nn.Linear(config.n_embd, 3 * config.n_embd)

Is this wanted or is it just a small bug? Thanks!

theicfire commented 2 months ago

Why do you think the transformer paper was not using a bias? It just says linear projections in the paper:

LeoPerelli commented 2 months ago

Hey! I interpret linear projection as applying a matrix, while I think the transformation Ax+b to be called affine transformation. Anyways, guess it's just a tiny detail of the implementation!

theicfire commented 2 months ago

Oh yeah the term affine is rarely used in ML papers, they'll just say linear, so it's simply ambiguous what the authors were implying.