tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

How do you calculate context vector? #295

Open freesunshine0316 opened 6 years ago

freesunshine0316 commented 6 years ago

From the output it seems that you use weighted sum over the memory_layer outputs rather than the original encoder states. I trained with num_units=500, and observed the following log. I bold face the suspicious line, where the first dimension of decoder LSTM cell is 1500, instead of 1000. I'm very curious how is the 1500 comprised of? Thanks!

Trainable variables

embeddings/encoder/embedding_encoder:0, (19342, 500), /device:GPU:0 embeddings/decoder/embedding_decoder:0, (19099, 500), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/kernel:0, (1000, 2000), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/kernel:0, (1000, 2000), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/decoder/memory_layer/kernel:0, (1000, 500), dynamic_seq2seq/decoder/attention/basic_lstm_cell/kernel:0, (1500, 2000), /device:GPU:0 dynamic_seq2seq/decoder/attention/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/decoder/attention/bahdanau_attention/query_layer/kernel:0, (500, 500), /device:GPU:0 dynamic_seq2seq/decoder/attention/bahdanau_attention/attention_v:0, (500,), /device:GPU:0 dynamic_seq2seq/decoder/attention/attention_layer/kernel:0, (1500, 500), /device:GPU:0 dynamic_seq2seq/decoder/output_projection/kernel:0, (500, 19099),