From the output it seems that you use weighted sum over the memory_layer outputs rather than the original encoder states. I trained with num_units=500, and observed the following log. I bold face the suspicious line, where the first dimension of decoder LSTM cell is 1500, instead of 1000. I'm very curious how is the 1500 comprised of? Thanks!

Trainable variables

embeddings/encoder/embedding_encoder:0, (19342, 500), /device:GPU:0 embeddings/decoder/embedding_decoder:0, (19099, 500), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/kernel:0, (1000, 2000), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/fw/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/kernel:0, (1000, 2000), /device:GPU:0 dynamic_seq2seq/encoder/bidirectional_rnn/bw/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/decoder/memory_layer/kernel:0, (1000, 500), dynamic_seq2seq/decoder/attention/basic_lstm_cell/kernel:0, (1500, 2000), /device:GPU:0 dynamic_seq2seq/decoder/attention/basic_lstm_cell/bias:0, (2000,), /device:GPU:0 dynamic_seq2seq/decoder/attention/bahdanau_attention/query_layer/kernel:0, (500, 500), /device:GPU:0 dynamic_seq2seq/decoder/attention/bahdanau_attention/attention_v:0, (500,), /device:GPU:0 dynamic_seq2seq/decoder/attention/attention_layer/kernel:0, (1500, 500), /device:GPU:0 dynamic_seq2seq/decoder/output_projection/kernel:0, (500, 19099),

tensorflow / nmt

How do you calculate context vector? #295

Trainable variables