keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.9k stars 19.45k forks source link

Bidirectional RNN updates? #856

Closed dbonadiman closed 7 years ago

dbonadiman commented 9 years ago

Hello everyone, in the past days i've looked into many different issues posted about the possibility of implementing Bidirectional RNN in Keras. After digging a bit into it i came out with a simple solution that seems to converge but i need to understand to what extent it is theoretically sound and then i can push an update.

From what i have understood we need to look at the input forward and backward, then the output should be merged in some way before feeding it in some way to the following layers.

Well, using the graph model with a little modification to the actual recurrent layers we are one step away. the Theano scan function has a go_backwards function we can wrap this api into the init of our Recurrent layer. And produce a bidirectional recurrent neural network in this way.

Example BidirectionalLSTM.

model = Graph()
model.add_input(name='input', input_shape=(1,), dtype=int)
model.add_node(LSTM(256,  return_sequences=True), name='forward', input='input')
model.add_node(LSTM(256,  return_sequences=True, go_backwards=True), name='backward', input='input')
model.add_node(The rest of the network, name='output_layers', inputs=['forward', 'backward'])

Am i missing something or we arrived at a point with that?

By the way i tried it locally and it converges.

EderSantana commented 8 years ago

This solution looks really nice @dbonadiman. Would you have time to write an example like this https://github.com/EderSantana/seya/blob/master/examples/imdb_brnn.py to share? It would be nice if you could compare results to see if they get the same thing.

dbonadiman commented 8 years ago

I'm working on it what is your current benchmark in terms of accuracy on the 4th epoch? I did a test with the same number of parameters but probably we need some crossvalidation of sort or at least testing different seeds to compare the results properly.

Just to sum up mine in three different configurations:

On GPU Tesla K40c. Bidirectional(return_sequences) + TimeDistributedMerge: Forward LSTM Backward GRU size 64.

0.8352 max accuracy at epoch 1 0.8204 at epch 4. (126 sec at epoch)

Forward_LSTM_imdb (keras example) : 0.8378 max accuracy at epoch 2 0.8302 at epoch 4. (83 sec for epoch)

Bidirectional(return_sequences=False) -> Forward LSTM Backward GRU size 64.

0.8328 max accuracy at epoch 2 0.8318 at epoch 4. (120 sec for epoch)

There are pro and cons about both implementations (mine and your) in particular mine allows to stuck up how many layers we want directly and in the order we want (for intance a bidirectional GRU concatenated to a bidirectional LSTM). The downside is that it requires modification to all the other recurrent layers and a Graph model to be implemented.

My results may be inconsistent due to reproducibility problem using dropout on GPU

jeffzhengye commented 8 years ago

@dbonadiman for the bidriectional rnn, I suspect it's having problem, when we set return_sequences==False. Say we do left padding (e.g input=[0, 0, 0, 3, 5], 0 indicates mask in embedding), it's fine to scan from left to right. However, when you reverse (scan from right to left) (then input=[5, 3, 0, 0, 0]), the output we get from with return_sequences==False is the hidden state of '0', instead of '3', which is wrong to me. @fchollet How do you thing? Is it a bug?

BTW, since you mentioned seya in this, seya's implementation also seems buggy to me since they do Xb = Xb[::-1]. They reverse the first dimension, which is unfortunately the batch size dimension.

dbonadiman commented 8 years ago

I Will look into it thanks for your issue, just a general question do you used a masked input? e.g. Embedding(100, 30, mask_zero=True)

jeffzhengye commented 8 years ago

@dbonadiman yes I did

dbonadiman commented 8 years ago

I will go through the theano implementation of scan trying to understand if there are problem with masking and go_backwardi'm a bit busy with actual research this days but if that leads me somewhere it may help my models too so i will spend some time for sure.

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.