yos1up / DNC

Differentiable Neural Computers
129 stars 34 forks source link

A variant of the deep LSTM? #2

Open Seraphli opened 7 years ago

Seraphli commented 7 years ago

In the page 7 of the article, the author write:

we have used the following variant of the deep LSTM architecture

But in your code, you are using LSTM model defined in Chain framework. So I wonder whether your model is the exactly representation of the model mentioned in the article. I am not arguing with you, just want to discuss this with you.

physicso commented 7 years ago

I don't think the variant of the LSTM is different from the original version except for the input X_t is designed to include memory readings :)

Seraphli commented 7 years ago

If you take a look at the first page of METHODS part, you can see that the formulation of input gate includes three input, Xt, h{t-1}^l and h_t^{l-1}. I think the input gate of original LSTM only use Xt and h{t-1}^l. So I have that problem.

physicso commented 7 years ago

@Seraphli I think it is just the right expression for LSTM in a network with several hidden layers. The lth-layer cell gets input from its past (h_{t-1}^l), its lower layer (h_t^{l-1}) and the sample (X_t). You may also refer to Eq. (11) in [Graves, A., Mohamed, A.-r. & Hinton, G. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (eds Ward, R. et al.) 6645–6649 (Curran Associates, 2013).] for an explicit expression for the LSTM, which is the same as in the Nature paper :)

vcchengcheng commented 7 years ago

where is the dataset?