keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.97k stars 19.46k forks source link

How to implement a deep bidirectional LSTM? #1629

Closed udani969 closed 8 years ago

udani969 commented 8 years ago

I am trying to implement a LSTM based speech recognizer. So far I could set up bidirectional LSTM (i think it is working as a bidirectional LSTM) by following the example in Merge layer. Now I want to try it with another bidirectional LSTM layer, which make it a deep bidirectional LSTM. But I am unable to figure out how to connect the output of the previously merged two layers into a second set of LSTM layers. I don't know whether it is possible with Keras. Hope someone can help me with this.

Code for my single layer bidirectional LSTM is as follows

left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

Dimensions of my x and y values are as follows.

(100, 'train sequences') (20, 'test sequences') ('X_train shape:', (100, 99, 13)) ('X_test shape:', (20, 99, 13)) ('y_train shape:', (100, 99, 11)) ('y_test shape:', (20, 99, 11))

farizrahman4u commented 8 years ago

1282 will help. Works only for theano though.

farizrahman4u commented 8 years ago

Or you could simply use the following fork function to make 2 copies of your merged layer:

def fork (model, n=2):
    forks = []
    for i in range(n):
        f = Sequential()
        f.add (model)
        forks.append(f)
    return forks
left = Sequential()
left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13)))
right = Sequential()
right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid', input_shape=(99, 13), go_backwards=True))

model = Sequential()
model.add(Merge([left, right], mode='sum'))

#Add second Bidirectional LSTM layer

left, right = fork(model)

left.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid'))

right.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', return_sequences=True, activation='tanh',
               inner_activation='sigmoid',  go_backwards=True))

#Rest of the stuff as it is

model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(TimeDistributedDense(nb_classes))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

It would be better to use the Bidirectional wrapper or the Graph for this sort of stuff.

udani969 commented 8 years ago

Wow it worked. I used the fork method, because it said some checks were not successful under the Wrapper approach. Just now only I could get it to work. Thanks a lot for the support.

talentlei commented 8 years ago

@farizrahman4u I use your code as above and get a model. but When I load the model and test , I got error as follow:

File "BLSTM_NER.py", line 1058, in test() File "BLSTM_NER.py", line 1038, in test ner.rnn_test(resfile,model_file,weights) File "BLSTM_NER.py", line 943, in rnn_test out = model.predict([self.X_test,self.X_test],batch_size=batch_size) File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 693, in predict return self._predict_loop(self._predict, X, batch_size, verbose)[0] File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 356, in _predict_loop batch_outs = f(ins_batch) File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 448, in call return self.function(*inputs) File "/home/cl/download/Theano/theano/compile/function_module.py", line 845, in call self.inv_finder[c])) TypeError: Missing required input: <TensorType(float32, 3D)>

my code of test is as follow:

    print "load model"
       model = model_from_json(open(my_model).read())
       model.load_weights(weights)
       print "load model finish" 
       out = model.predict([self.X_test,self.X_test],batch_size=batch_size)

How I got this error ? can you help me ? thanks~

Windy-Ground commented 8 years ago

https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

vinayakumarr commented 8 years ago

i was trying @farizrahman4u example of deep bidirectional LSTM for my dataset whic has 50000 rows and 20 columns(19 features and 1 class label) and

X_train = sequence.pad_sequences(X_train, maxlen=100) X_test = sequence.pad_sequences(X_test, maxlen=100)

I am getting the following error. i know it is because of dimension shape in model.fit function but i dont know how to resolve this. untitled

farizrahman4u commented 8 years ago

The problem is with the shape of your input data. The error message is pretty clear, lstm needs 3d data, but you are providing it 2d. The example I provided above is obsolete, use the functional api instead.

9thDimension commented 8 years ago

@farizrahman4u When you say "functional API", what do you mean exactly?

I saw this syntax here: model.add(Bidirectional(LSTM(10, input_shape=(5, 10), return_sequences=True))) But I don't know which package to import the Bidirectional() class from

and this syntax here: backwards = LSTM(64, go_backwards=True)(embedded) But then I'm not exactly sure how to make a multi-layer biridectional LSTM (use the forking approach you described above on Feb 3rd?)

P.S. I want many-to-many sequence labelling, so where do I need to put the return_sequences=True flags?

farizrahman4u commented 8 years ago

Google for Keras functional api. The bidirectional wrapper is from my seq2seq library.

9thDimension commented 8 years ago

@farizrahman4u Oh it's part of the seq2seq library I see.

Is this the correct usage to make a 2-layer bidirectional LSTM to output a category prediction for every input character?

Input chars are 43-dimensional, and there are 5 possible output categories.

from keras.models import Sequential
from keras.layers import Activation, LSTM, Merge, TimeDistributedDense
from keras.optimizers import SGD

def fork (model, n=2):
    forks = []
    for i in range(n):
        f = Sequential()
        f.add (model)
        forks.append(f)
    return forks

# First bidirectional LSTM layer

forward = Sequential()
forward.add(LSTM(output_dim=512, input_shape=(50, 43), return_sequences=True))
backward = Sequential()
backward.add(LSTM(output_dim=512, input_shape=(50, 43), return_sequences=True, go_backwards=True))

model = Sequential()
model.add(Merge([forward, backward], mode='concat'))

# Second bidirectionl LSTM layer

forward_2, backward_2 = fork(model)

forward_2.add(LSTM(output_dim=512, input_shape=(50, 512), return_sequences=True))
backward_2.add(LSTM(output_dim=512, input_shape=(50, 512), return_sequences=True, go_backwards=True))

model = Sequential()
model.add(Merge([forward_2, backward_2], mode='concat'))

# Softmax decision layer

model.add(TimeDistributedDense(output_dim=5))
model.add(Activation('softmax'))

# Optimizer function

sgd = SGD(lr=0.1, decay=1e-5, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

print("Train...")
model.fit([X_train, X_train], Y_train, batch_size=1, nb_epoch=nb_epoches, validation_data=([X_test, X_test], Y_test), verbose=1, show_accuracy=True)

Also, For this type of architecture, do the inputs have to "overlap" like so:

x_0 = [0, 1, 2, 3, 4], y_0 = [A, B, C, D, E]
x_1 = [1, 2, 3, 4, 5], y_1 = [B, C, D, E, F]
x_2 = [2, 3, 4, 5, 6], y_2 = [C, D, E, F, G]

or not overlap like so:

x_0 = [0, 1, 2, 3, 4],      y_0 = [A, B, C, D, E]
x_1 = [5, 6, 7, 8, 9],      y_1 = [F, G, H, I, J]
x_2 = [10, 11, 12, 13, 14], y_2 = [K, L, M, N, O] 
vinayakumarr commented 8 years ago

@farizrahman4u before posting it i know the error i am getting because of dimension problem. I have train data set which is of size 390321 and 23 classes and test data set 20000 (i have correct label also which has 40) I am loading train, test and correct label data set and i am trying to apply deep bidirectional stateful lstm.

train data set size is 390321_41 (40 features and another one is class label) test data set size is 20000_40 corrected label size is 20000*1

how to reshape the dimension and apply to deep bidirectional stateful lstm?

strin commented 8 years ago

@farizrahman4u @9thDimension when running lstm in the reverse direction, shouldn't the output corresponds to inputn, input{n-1}, input_{n-2}, ..., input_1? In that case, when concatenating with the output from the forward direction, we should reverse it?

farizrahman4u commented 8 years ago

@strin I have added the Bidirectional wrapper to Keras.. set the bidirectional lstm example.

williamjqk commented 7 years ago

Official manual can be referenced here, https://keras.io/layers/wrappers/#bidirectional

grafael commented 7 years ago

I'm afraid that the Bidirectional Wrapper will not work in Keras Functional Api. Any help in this sort of thing:

main_input = Input(shape=(100,), dtype='int32', name='main_input')
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
lstm = LSTM(32)(x)
bidirectional = Bidirectional()(lstm) #how bidirectional should be instantiated?
jojonki commented 7 years ago

@grafael

How about this? Bidirectional has a layer at first arg. bidirectional = Bidirectional(LSTM(32))(x)

ylmeng commented 7 years ago

Doesn't the 'go_backwards' option reverse the output order too? so model.add(Merge([left, right], mode='sum')) does not make sense (you must flip one of them before adding)?

Ap1075 commented 6 years ago

@ylmeng Yes, it is handled automatically. You don't have to flip it before merging as far as i know.