mila-iqia / blocks

A Theano framework for building and training neural networks
Other
1.16k stars 351 forks source link

LSTM.apply interface. recurrent.states and recurrent.outputs #1025

Open Beronx86 opened 8 years ago

Beronx86 commented 8 years ago

LSTM.apply use 'states', 'cells' as recurrent states and return them as output. I think returning 'cells' is unnecessary. It makes the LSTM interface different from that of GRU. And LSTM is incompatible with SequenceContentAttention or SequenceGenerator for LSTM cells should not be used as inputs to them. It seems that the recurrent.outputs variables must contain the recurrent.states variables. This is the problem.

rizar commented 8 years ago

You can't say that LSTM is not compatible with SequenceContentAttention. The attention class is completely agnostic to where the data it processed comes from. Just pass "states" as the "attended" to SequenceContentAttention, and everything should work.

I am not 100% sure that LSTM can be used in AttentionRecurrent. This is something worth investigating.

On the other hand, if it works for AttentionRecurrent, it should also work for SequenceGenerator.

FYI, new sequence generating tools are coming, with which things should be a lot simper: https://github.com/mila-udem/blocks-extras/blob/master/blocks_extras/bricks/sequence_generator2.py

On 10 March 2016 at 02:38, Peng Liu notifications@github.com wrote:

LSTM.apply use 'states', 'cells' as recurrent states and return them as output. I think returning 'cells' is unnecessary. It makes the LSTM interface different from that of GRU. And LSTM is incompatible with SequenceContentAttention or SequenceGenerator for LSTM cells should not be used as inputs to them. It seems that the recurrent.outputs variables must contain the recurrent.states variables. This is the problem.

— Reply to this email directly or view it on GitHub https://github.com/mila-udem/blocks/issues/1025.