keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.58k stars 19.42k forks source link

How to build a Convolutional Bi-directional LSTM? #3322

Closed Davidott4 closed 7 years ago

Davidott4 commented 8 years ago

I'm trying to build a Convolutional Bi-directional LSTM to classify DNA sequences ala this paper: DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences

danqarch

The short version of it is to one-hot encode a DNA sequence: 'ATACG...' = [ [1,0,0,0], [0,0,0,1], [1,0,0,0], [0,1,0,0], [0,0,1,0], ...],

Then feed it to a convolutional-relu-maxpooling layer to find motifs, then into a bidirectional LSTM network to learn long-distance dependancies.

The original source code is here.

However, it uses an outdated version of Keras and includes a dependency on Seya, which is what I'd like to avoid doing. Here is my first attempt at building the model:

`

inputs = Input(shape=(500,4)) # Max 500 bases 
convo_1 = Convolution1D(320, border_mode='valid', filter_length=26, activation="relu",       subsample_length=1)(inputs)
maxpool_1 = MaxPooling1D(pool_length=13, stride=13)(convo_1)
drop_1 = Dropout(0.2)(maxpool_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)

model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')

`

Unfortunately, the loss remained nearly constant during training, and the accuracy stayed constant as well. This leads me to believe that I have set the model up incorrectly, or that 1D convolution is the wrong kind of convolution to use for this input type So i attempted to make switch to 2D convolution:

`

inputs = Input(shape=(1, 500,4))
convo_1 = Convolution2D(320, nb_row=15, nb_col=4, init='glorot_uniform', \
                   activation='relu', border_mode='same')(inputs)
maxpool_1 = MaxPooling2D((15, 4))(convo_1)
flat_1 = Flatten()(maxpool_1)
drop_1 = Dropout(0.2)(flat_1)
l_lstm = LSTM(320, return_sequences = True, go_backwards= False)(drop_1)
r_lstm = LSTM(320, return_sequences = True, go_backwards= True)(drop_1)
merged = merge([l_lstm, r_lstm], mode='sum')
drop_2 = Dropout(0.5)(merged)
flat = Flatten()(drop_2)
dense_1 = Dense(320, activation='relu')(flat)
out = Dense(num_classes, activation='sigmoid')(dense_1)

model = Model(inputs, out)
print ('compiling model')
model.compile(loss='binary_crossentropy', optimizer='rmsprop')

`

Which gives me the following error when trying to feed the flattened layer into the LSTM:

Exception: Input 0 is incompatible with layer lstm_4: expected ndim=3, found ndim=2

Is this the correct setup for a convolutional bi-directional LSTM? The original code uses a Sequential model, but because this model branches, that doesn't seem like the right track. If so, then I likely need to upgrade to a 2D Convolution LSTM, in which case, how can I fix the input error?

codekansas commented 8 years ago

The 1D convolution is correct for this application. 2D convolution would imply that you're trying to convolve over the one-hot encoding dimension, which you're not. The loss staying consistent probably has more to do with the fact that the network is fairly deep, so it could take a while to get any results. Could you give more information about training (how long, how you're presenting the dataset)?

A good sanity check might be to choose a positive and negative example from your dataset and feed them in over and over again to see if the network overfits on them.

Also, a "true" biLSTM would probably use use merge([l_lstm, r_lstm], mode='concat', concat_axis=-1)

ddofer commented 7 years ago

Could you post the code you use to encode/OHE the DNA sequences? When I tried it on a similar problem, my model didn't learn much, and I have the same issue as you with 1D convolutions (no learning..). Thanks!