kiccho1101 / paper

Interesting Papers, Articles, and Github repositories.

7 stars 0 forks source link

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation (2014, cited by 8600) #2

Open kiccho1101 opened 4 years ago

kiccho1101 commented 4 years ago

What's this paper about?

Introduces RNN Encoder-Decoder
One RNN encodes a sequence into a fixed length vector representation, and the other decodes the representation into another sequence.
Model is trained to maximize p(target|source)

Screen Shot 2020-05-02 at 17 22 28

kiccho1101 commented 4 years ago

Encoder

As encoder reads each data of sequence, the hidden state changes according to the equation below.
Conditional distribution of the output at each timestep t is described as below.

Decoder

Hidden state of decoder changes as following equation.
Therefore, conditional distribution looks like this.

kiccho1101 commented 4 years ago

Training phase

Encoder-Decoder are jointly trained to maximize the conditional log-likelihood.

Screen Shot 2020-05-03 at 16 34 04

Screen Shot 2020-05-03 at 16 37 48

kiccho1101 commented 4 years ago

Statistical Machine Translation (SMT)

Maximize posterior probability.
f: translation
e: source sentence
p(e|f): translation model
p(f): language model

kiccho1101 commented 4 years ago

Experiment

English/French translation task
Compare BLEU(Bilingual Evaluation Understudy) score among Baseline, RNN, CSLM+RNN, CSLM+RNN+WP(Word Penalty)

Screen Shot 2020-05-03 at 16 56 14

Best model was CSLM+RNN

kiccho1101 commented 4 years ago

Embedding visualization

Screen Shot 2020-05-03 at 17 04 32

Screen Shot 2020-05-03 at 17 04 42