Why wasn't K and V weren't passed from the top encoder to bottom decoder model?

lsdefine / attention-is-all-you-need-keras

A Keras+TensorFlow Implementation of the Transformer: Attention Is All You Need

702 stars 188 forks source link

Why wasn't K and V weren't passed from the top encoder to bottom decoder model? #21

Closed ichenjia closed 5 years ago

ichenjia commented 5 years ago

The paper specified that it wasn't the Z (output) that gets passed. it is actually the K and V got passed to the decoder. In the code, it simply intakes the output.