Closed AjayTalati closed 9 years ago
OK - someone smart pointed out to me that, lines 110-112, follow the the same interface as module:backward
where,
inputTable = {input(t), cell(t-1), output(t-1)}
gradOutputTable = {gradCell(t), gradOutput(t)}
gradInput(t), gradCell(t-1), gradOutput(t-1) = unpack(lstm:backward(inputTable, gradOutputTable))
So that makes sense now :+1:
On line 103 may I ask why the gradient of the loss w.r.t. the hidden layer vector dlstm_h
at time t=opt.seq_length
is not set to dfinalstate_h
which is defined on line 67 , i.e.
local dlstm_h = {[opt.seq_length]=dfinalstate_h} -- output values of LSTM
because the gradient dlstm_h[opt.seq_length]
is used on line 112 to start the backprop recursion?
Thanks for your help :+1:
dlstm_h[opt.seq_length] is being assigned to in line 107, by backproping from the outputs.
So, we actually do not need it as a parameter. I think line 67 can be safely removed, without any issues.
Sorry for the delayed response. Yes, that is correct. That code is a remnant of the state (c and h pair) being the input to something, like another LSTM like in Ilya Sutskever's encoder-decoder model, or a classifier for sequence classification. On Mar 20, 2015 2:21 AM, "Sherjil Ozair" notifications@github.com wrote:
dlstm_h[opt.seq_length] is being assigned to in line 107, by backproping from the outputs.
So, we actually do not need it as a parameter. I think line 67 can be safely removed, without any issues.
— Reply to this email directly or view it on GitHub https://github.com/oxford-cs-ml-2015/practical6/issues/1#issuecomment-83963769 .
Hi Brendan,
thanks for taking the time to answer,
That code is a remnant of the state (c and h pair) being the input to something, like another LSTM like in Ilya Sutskever's encoder-decoder model,
Yes that makes sense. For LSTM variational autoencoders/decoders, like in the DRAW paper, things seem to get rather involved? My implementation requires dlstm_h[opt.seq_length]
for the decoder to be assigned a final state to start the backward message passing.
Thanks once again for making your code, and teachings open source. They're great :+1:
Best regards,
Aj
Not sure I understand the backprop through LSTM timestep lines 110-112 in
train.lua
?Any chance of an explanation? Thanks:)