Closed begeekmyfriend closed 6 years ago
In his implementation, ConcatOutputAndAttentionWrapper
function concatenates the cell_output
and context vector
of AttentionWrapper(cell, attention_mechanism, ...)
. In tensorflow, the dim of cell_output
is the unit number of cell
, and the dim of context vector
is the dim of memory
for attention_mechanism
. For example, in his implementation:
https://github.com/keithito/tacotron/blob/master/models/tacotron.py#L51-L55
the dim of cell_output
is hp.attention_depth
, and the dim of context vector
is hp.encoder_depth
. So the concatenated dim is hp.attention_depth + hp.encoder_depth
not hp.attention_depth*2
.
@syang1993 What you are watching is the master branch and what I am saying is the tacotron2-work-in-progress
branch where the attention_depth is set as 128.
@begeekmyfriend It's the same in the tacotron2-work-in-progress
branch, I just set it as an example. For this branch, the dim is also hp.attention_depth + hp.encoder_depth*2=640
(Since it use BLSTM, so the dim of cell_output
is 2*'hp.encoder_depth').
You can see the implementation of AttentionWrapper
for details:
the attention_depth
in this repo will only affect the query and key depth of the attention mechanism.
To make it easier to understand, suppose:
encoder_output = encoder_net(input_text) # [batch_size, length, dim_a]
attention_cell = AttentionWrapper(
DecoderPrenet(GRUCell(dim_b)),
attention_mechanism(dim_c, encoder_output))
concat_cell = ConcatOutputAndAttentionWrapper(attention_cell)
Then the dim of concat_cell
is dim_a + dim_b
. The dim_c
(which reffers to the attention_depth
) doesn't affect the dims.
@syang1993 I just printed the output_size
of each cell. You are right.
In tacotron.py we can see the
attention_depth
is set as 128 and then it is wrapped intoConcatOutputAndAttentionWrapper
in which the attention RNN output and the context will be concatenated together. But I guess the concatenated dim should be 256 instead of 512.