Question about clone_many_times()

lisabug commented 9 years ago

I'm curious about why should we unroll the LSTM to T timesteps. Since all copies share the same parameters, every time doing bp step makes the LSTM's parameters and gradPrameters changing. Why do we just use one LSTM make fp repeatedly from t = 1 to T and then bp from t = T to 1 repeatedly. Because I wanted imply LSTM as a decoder to generate sentence, I have to handle the sequence of variant length. However, I tried my thought but failed. Could someone help me, thanks honestly.

bshillingford commented 9 years ago

You need clones to store intermediate activations. These are needed for correctly computing gradients.

Sent from mobile On Oct 29, 2015 4:50 PM, "Yuanqin Lu" notifications@github.com wrote:

I'm curious about why should we unroll the LSTM to T timesteps. Since all copies share the same parameters, every time doing bp step makes the LSTM's parameters and gradPrameters changing. Why do we just use one LSTM make fp repeatedly and then bp repeatedly. Because I wanted imply LSTM as a decoder to generate sentence, I have to handle the sequence of variant length. However, I tried my thought but failed. Could someone help me, thanks honestly.

— Reply to this email directly or view it on GitHub https://github.com/oxford-cs-ml-2015/practical6/issues/5.

lisabug commented 9 years ago

I got it, thanks for your help:)

oxford-cs-ml-2015 / practical6

Question about clone_many_times() #5