jzilly / RecurrentHighwayNetworks

Recurrent Highway Networks - Implementations for Tensorflow, Torch7, Theano and Brainstorm
MIT License
403 stars 70 forks source link

Common weights for raw input and current state #14

Closed Justin-Tan closed 7 years ago

Justin-Tan commented 7 years ago

I was looking at the TF implementation and it seems that you use the same weights for the input x^{[t]} as well as the hidden state s^{[t]} in the function linear(...), but in the paper, Eqs. 7,8,9, they are labelled as distinct matrices - W and R?

Let me know if I'm missing something.

Great paper, by the way, early results seem to be competitive with deep bidirectional GRUs for sequence classification.

jzilly commented 7 years ago

Hi Justin, Thanks for having such a close look.

The function linear() takes a list of tensors (among other things) as input. These inputs are then stacked. Likewise a matrix is allocated to compute the matrix vector product of the inputs and the weight matrices. To save computation time, both Wx[t] and Ry[t-1] are computed simultaneously as follows: out = [W,R]*[x,y] In this case, the matrix allocated in linear() is [W,R]. The associated inputs are stacked to make this construction work. Please note that x[t] and y[t-1] are not processed by the same weights (as far as I can see). Thanks again for your feedback.

Please let me know if there is anything else that should be made clearer in the code.