Open kiccho1101 opened 4 years ago
As encoder reads each data of sequence, the hidden state changes according to the equation below.
Conditional distribution of the output at each timestep t is described as below.
Hidden state of decoder changes as following equation.
Therefore, conditional distribution looks like this.
Maximize posterior probability.
f: translation
e: source sentence
p(e|f): translation model
p(f): language model
What's this paper about?
Introduces RNN Encoder-Decoder
One RNN encodes a sequence into a fixed length vector representation, and the other decodes the representation into another sequence.
Model is trained to maximize p(target|source)