oxford-cs-ml-2015 / practical6

Practical 6: LSTM language models
https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
260 stars 83 forks source link

Backprop through clones #1

Closed AjayTalati closed 9 years ago

AjayTalati commented 9 years ago

Not sure I understand the backprop through LSTM timestep lines 110-112 in train.lua?

Any chance of an explanation? Thanks:)

AjayTalati commented 9 years ago

OK - someone smart pointed out to me that, lines 110-112, follow the the same interface as module:backward

where,

inputTable = {input(t), cell(t-1), output(t-1)} gradOutputTable = {gradCell(t), gradOutput(t)} gradInput(t), gradCell(t-1), gradOutput(t-1) = unpack(lstm:backward(inputTable, gradOutputTable))

So that makes sense now :+1:

On line 103 may I ask why the gradient of the loss w.r.t. the hidden layer vector dlstm_h at time t=opt.seq_length is not set to dfinalstate_h which is defined on line 67 , i.e.

local dlstm_h = {[opt.seq_length]=dfinalstate_h} -- output values of LSTM

because the gradient dlstm_h[opt.seq_length] is used on line 112 to start the backprop recursion?

Thanks for your help :+1:

sherjilozair commented 9 years ago

dlstm_h[opt.seq_length] is being assigned to in line 107, by backproping from the outputs.

So, we actually do not need it as a parameter. I think line 67 can be safely removed, without any issues.

bshillingford commented 9 years ago

Sorry for the delayed response. Yes, that is correct. That code is a remnant of the state (c and h pair) being the input to something, like another LSTM like in Ilya Sutskever's encoder-decoder model, or a classifier for sequence classification. On Mar 20, 2015 2:21 AM, "Sherjil Ozair" notifications@github.com wrote:

dlstm_h[opt.seq_length] is being assigned to in line 107, by backproping from the outputs.

So, we actually do not need it as a parameter. I think line 67 can be safely removed, without any issues.

— Reply to this email directly or view it on GitHub https://github.com/oxford-cs-ml-2015/practical6/issues/1#issuecomment-83963769 .

AjayTalati commented 9 years ago

Hi Brendan,

thanks for taking the time to answer,

That code is a remnant of the state (c and h pair) being the input to something, like another LSTM like in Ilya Sutskever's encoder-decoder model,

Yes that makes sense. For LSTM variational autoencoders/decoders, like in the DRAW paper, things seem to get rather involved? My implementation requires dlstm_h[opt.seq_length] for the decoder to be assigned a final state to start the backward message passing.

decoder_module_backward_graph

Thanks once again for making your code, and teachings open source. They're great :+1:

Best regards,

Aj