harvardnlp / annotated-transformer

An annotated implementation of the Transformer paper.
http://nlp.seas.harvard.edu/annotated-transformer
MIT License
5.68k stars 1.23k forks source link

Typo in Multihead-attention: #98

Closed Jostarndt closed 1 year ago

Jostarndt commented 2 years ago

In the MultiheadAttention the line

self.linears = clones(nn.Linear(d_model, d_model), 4)

occurs, but it should be a 3 instead of a 4:

self.linears = clones(nn.Linear(d_model, d_model), 3)

am I correct?

Jostarndt commented 2 years ago

shared credits to @samiede

stanwinata commented 2 years ago

@Jostarndt, perhaps not, since you need 4 self.linears:

  1. linear_Q (to project query)
  2. linear_K (to project key)
  3. linear_V (to project value)
  4. linear_O (to project output)

It's true that the fourth linear is a bit more hidden, since it is located on the very top after concat as seen in the image below 😅

Screen Shot 2022-09-05 at 4 23 20 PM
ljmanso commented 1 year ago

I was looking into the issues section with the same doubt in mind and @stanwinata is right. The code is correct, just a bit obscure (I spent half an hour trying to figure it out).

When you use zip, the generator (here I mean the zip Python generator) stops with the shortest iterable. Therefore, if you zip two iterables, you will get as many items as the shortest iterable. For instance:

long_iterable = [1, 2, 3, 4] short_iterable = ['q', 'k', 'v'] for a,b in zip(long_iterable, short_iterable): ... print(a, b) ... 1 q 2 k 3 v

Jostarndt commented 1 year ago

Yes, I forgot to finally reply to already the first answer! You are so right, thanks for pointing out 4.!