Closed timolohrenz closed 4 years ago
Yes, this is done on purpose. This way you save a lot of parameters, you generalize better, and (according to the task we considered so far) you improve the performance. Best,
Mirco
On Sat, 25 Jul 2020 at 18:12, Timo Lohrenz notifications@github.com wrote:
Hmm, I am really thankful about your LSTM implementation which seems to be afaik the only non-cudnn based one which allows me to use customizations like static dropout masks etc.. However I think there might be an issue
As I understood the implementation of the bidirectional LSTM, the additional backward direction is processed by flipping the inputs in time and adding them as additional batches to the input tensor at this point: https://github.com/mravanelli/pytorch-kaldi/blob/775f5dbbf142fb1c1a56604ee603d426ca73a51f/neural_networks.py#L415-L417
Later in the forward pass the input tensor x is then passed through all 4 weight matrices:
Doesn't that mean, that the exact same weight matrices are applied to both directions? I am a bit suspicious as torch summary shows only 4 weight matrices for the input-hidden connections while showing 8 weight matrices for the hidden-hidden connections (I am using a Layer Size of 512 in the LSTM => 512*513= 262656).
Layer (type) Output Shape Param _LSTM-159 [-1, 200, 512] 0 Linear-160 [-1, 400, 512] 262,656 Linear-161 [-1, 400, 512] 262,656 Linear-162 [-1, 400, 512] 262,656 Linear-163 [-1, 400, 512] 262,656 Linear-164 [-1, 512] 262,144 Linear-165 [-1, 512] 262,144 Linear-166 [-1, 512] 262,144 Linear-167 [-1, 512] 262,144 Tanh-168 [-1, 512] 0 Tanh-169 [-1, 512] 0 Linear-170 [-1, 512] 262,144 Linear-171 [-1, 512] 262,144 Linear-172 [-1, 512] 262,144 Linear-173 [-1, 512] 262,144 Tanh-174 [-1, 512] 0 Tanh-175 [-1, 512] 0
This also means that the number of parameters is not doubling when using bidirectional LSTM as it is mentioned in Issue #214 https://github.com/mravanelli/pytorch-kaldi/issues/214
Is this intentional behavior or am I getting something wrong?
Thanks for your help and the good work!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVU56BPXAHL3BMC4HN3R5NKEBANCNFSM4PHVCXQQ .
Hey Mirco,
okay, thanks for pointing that out. Interesting!
Best regards, Timo
Hmm, I am really thankful about your LSTM implementation which seems to be afaik the only non-cudnn based one which allows me to use customizations like static dropout masks etc.. However I think there might be an issue
As I understood the implementation of the bidirectional LSTM, the additional backward direction is processed by flipping the inputs in time and adding them as additional batches to the input tensor at this point: https://github.com/mravanelli/pytorch-kaldi/blob/775f5dbbf142fb1c1a56604ee603d426ca73a51f/neural_networks.py#L415-L417
Later in the forward pass the input tensor x is then passed through all 4 weight matrices: https://github.com/mravanelli/pytorch-kaldi/blob/775f5dbbf142fb1c1a56604ee603d426ca73a51f/neural_networks.py#L431-L435
Doesn't that mean, that the exact same weight matrices are applied to both directions? I am a bit suspicious as torch summary shows only 4 weight matrices for the input-hidden connections while showing 8 weight matrices for the hidden-hidden connections (I am using a Layer Size of 512 in the LSTM => 512*513= 262656).
This also means that the number of parameters is not doubling when using bidirectional LSTM as it is mentioned in Issue https://github.com/mravanelli/pytorch-kaldi/issues/214
Is this intentional behavior or am I getting something wrong?
Thanks for your help and the good work!