Closed psubnwell closed 6 years ago
When I create the input
in ptb.py
I add the <sos>
token at the beginning of the sequence, and a <eos>
at the end of the target
sequence.
As you observed correctly, both the encoder and the decoder receive during training the same input in model.py
. I.e. something like this <sos> hello world
and the corresponding target looks like this hello world <eos>
. Like this the shift between input
and target
is guaranteed.
So the difference to the paper is that the encoder gets additionally a <sos>
as the first input, and the decoder gets a <sos>
instead of a <eos>
, which actually makes more sense to me since we want to start generating a new sequence.
Ok I understand. Because I run your code using my own data loader rather than your ptb.py
. I didn't notice this difference. Sorry!
In your
model.py
,the input of decoder (i.e.
input_embedding
) is the same as the input of encoder, that seems not correct. According to the cited paper, the input of encoder is['RNNs', 'work']
the input of decoder is['<EOS>', 'RNNs', 'work']
the output of decoder is['RNNs', 'work', '<EOS>']
So I think the input of decoder should has one token earlier than the input of encoder...