humza909 / LLM_Survey

81 stars 6 forks source link

Mistake in Chapter 2 - Background #2

Open cornzz opened 3 months ago

cornzz commented 3 months ago

Hi, I noticed a mistake in the description of cross attention in 2.3. Attention in LLMs:

Cross Attention: It is used in encoder-decoder architectures, where encoder outputs are the queries, and key-value pairs come from the decoder.

In reality it is the other way around, the queries come from the decoder self-attention, while the encoder outputs act as key and value. See also here or here, chapter 3.2.3:

In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder.