Common weights for raw input and current state

jzilly / RecurrentHighwayNetworks

Recurrent Highway Networks - Implementations for Tensorflow, Torch7, Theano and Brainstorm

MIT License

403 stars 70 forks source link

Hi Justin, Thanks for having such a close look.

The function linear() takes a list of tensors (among other things) as input. These inputs are then stacked. Likewise a matrix is allocated to compute the matrix vector product of the inputs and the weight matrices. To save computation time, both Wx[t] and Ry[t-1] are computed simultaneously as follows: out = [W,R]*[x,y] In this case, the matrix allocated in linear() is [W,R]. The associated inputs are stacked to make this construction work. Please note that x[t] and y[t-1] are not processed by the same weights (as far as I can see). Thanks again for your feedback.

Please let me know if there is anything else that should be made clearer in the code.

jzilly / RecurrentHighwayNetworks

Common weights for raw input and current state #14