Closed louis-dv closed 1 year ago
Hey Louis,
thank you so much for the interesting question.
You are right, lstm_hidden_state and lstm_cell_state are generated randomly from a normal distribution. Hence, they are different at each forward pass. This is intentional!
For identical input batches and identical lstm_hidden_state and lstm_cell_state, the output of the LSTM should be consistent.
Using a noise as the initial values of the two states is just one way to initialize them. I had good experience with this during my early experiments. Using zero states is probably the default way to do it. If you want to ensure that your output is always the same, this is the way to go.
Please update me, if you do some experiments with zero initialization. I am super interested in the results :D
Hope this helps.
Julian
Hello, First, thank you very much for the paper and code. It is very interesting.
I am not sure I correctly understand the way you define your LSTM encoder.
Indeed, looking at the code here :
I assume that
lstm_hidden_state
andlstm_cell_state
will be different at each forward pass as they are going to be random tensors.I actually tested the code on a trained model and I see that on evaluation mode for the exact same input batch, the result of lstm_out are actually different.
Am I misunderstanding something?
Thank you for the help!