macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

Why do we append all the previous tokens of the decoder to generate next token? #39

Closed vikram-gupta closed 8 years ago

vikram-gupta commented 8 years ago

Hi @macournoyer @chenb67 ,

In the 'eval' function of seq2seq.lua, we append the previous tokens generated by the decoder to generate the next token. Why do we do this?

My understanding was that the decoder will be able to remember the older tokens using its memory and thus we may not need to feed all the generated tokens again. We might be okay in just feeding the latest generated token.


`local prediction = self.decoder:forward(torch.Tensor(output))[#output]`
----
----
next_output = wordIds[1]
---- Here we are appending the previously generated tokens
table.insert(output, next_output)
chenb67 commented 8 years ago

Hi @vikram-gupta,

The way we currently use the decoder (simply using the forward method) is zeroing the state vector every time it is called. This behaviour is defined by the Sequencer module which calls the forget method of the children AbstractRecurrent modules. [1]

It is however possible to use the Sequencer:remember() method to make it save the state and maybe save some computations.

Chen

[1] https://github.com/Element-Research/rnn#sequencer

vikram-gupta commented 8 years ago

Thanks @chenb67