Open Beronx86 opened 8 years ago
You can't say that LSTM is not compatible with SequenceContentAttention. The attention class is completely agnostic to where the data it processed comes from. Just pass "states" as the "attended" to SequenceContentAttention, and everything should work.
I am not 100% sure that LSTM can be used in AttentionRecurrent. This is something worth investigating.
On the other hand, if it works for AttentionRecurrent, it should also work for SequenceGenerator.
FYI, new sequence generating tools are coming, with which things should be a lot simper: https://github.com/mila-udem/blocks-extras/blob/master/blocks_extras/bricks/sequence_generator2.py
On 10 March 2016 at 02:38, Peng Liu notifications@github.com wrote:
LSTM.apply use 'states', 'cells' as recurrent states and return them as output. I think returning 'cells' is unnecessary. It makes the LSTM interface different from that of GRU. And LSTM is incompatible with SequenceContentAttention or SequenceGenerator for LSTM cells should not be used as inputs to them. It seems that the recurrent.outputs variables must contain the recurrent.states variables. This is the problem.
— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/1025.
LSTM.apply use 'states', 'cells' as recurrent states and return them as output. I think returning 'cells' is unnecessary. It makes the LSTM interface different from that of GRU. And LSTM is incompatible with SequenceContentAttention or SequenceGenerator for LSTM cells should not be used as inputs to them. It seems that the recurrent.outputs variables must contain the recurrent.states variables. This is the problem.