Open seabay opened 6 years ago
Hi @seabay
You might misunderstand what the hidden state of the 1st layer and the hidden state of the 2nd layer is.
Use encoder_hidden[:decoder_test.n_layers]
, we extract the normal time order hidden state --->
, while encoder_hidden[decoder_test.n_layers:]
gives us the reverse time order hidden state <---
.
Though in my opinion, using which one might does not really matter, it's more common to use normal time order hidden state of Bi-RNN.
Hope it helps.
Hi @Engine-Treasure I think that number of layers is nothing related to Bi-direction, for example, encoder is a 2-layered Bi-RNN, so the hidden state has (2 * 2)=4 parts, the first two parts are the forward and backward state of layer 1, the last two parts are for layer 2.
So the question is: do we use the hidden state of the layer 1 or layer 2?
Hi @seabay
So sorry for my mistaking num_layers and hidden_size.
Then here comes another question, does forward hidden and backward hidden alternate in layers or forward hidden comes first?
[
layer0_forward
layer0_backward
layer1_forward
layer1_backward
] or
[
layer0_forward
layer1_forward
layer0_backward
layer1_backward
]
You can find some answers here
@spro 's answer is they alternates in layers. However the code which we're talking about seems does not match the answer.
I just get more confused, :(
hi @Engine-Treasure Based on my experiments, codes match the first layout which alternates in layers. But why @spro choose the first layer as the context vector for Decoder?
This won't be a very satisfying answer, but I believe the reason is just that this is left over from a non-bidirectional encoder, and this slicing was a workaround to make it fit the decoder. The batched version is still very much a work in progress (despite the lack of recent progress).
Two better solutions would be:
I have the same question and agree with the pattern 1 based on intuition. But through my experiments I found out the following:
https://discuss.pytorch.org/t/gru-output-and-h-n-relationship/12720
@spro I think summing up forward and backward hidden state at the last position of encoder would not be a good idea because the backward hidden state at last position contains little information about the sentence.
Hi, all I have confusion about this:
decoder_hidden = encoder_hidden[:decoder_test.n_layers] # Use last (forward) hidden state from encoder,
should this be
decoder_hidden = encoder_hidden[decoder_test.n_layers:] ? Because now it is the hidden state of the second layer.