hycis / bidirectional_RNN

bidirectional lstm
MIT License
153 stars 52 forks source link

Return_sequences? #4

Open cjmcmurtrie opened 9 years ago

cjmcmurtrie commented 9 years ago

Nice work, this looks very promising.

I was thinking of modifying it for something I'm doing since it doesn't seem to implement return_sequences so that you can return a whole sequence of outputs? Or am I missing something?

If you want to give me a head start, how do you imagine modifying this to return a sequence of outputs? :)

hycis commented 9 years ago

for now, the default is to return the sequence of outputs. If input is x_1, x_2, .. x_k, the output will be y_1, y_2, .. y_k. But if just want to take the the last output y_k, then for the get_output you can just take the last element of the sequence y=forward+backward; return y[:,-1,:]

cjmcmurtrie commented 9 years ago

Like this?

def get_output(self, train):
    forward = self.get_forward_output(train)
    backward = self.get_backward_output(train)
    if self.output_mode is 'sum':
        output = forward + backward
    elif self.output_mode is 'concat':
        output = T.concatenate([forward, backward], axis=2)
    else:
        raise Exception('output mode is not sum or concat')
    if self.return_sequences==False:
        return output[:,-1,:]
    elif self.return_sequences==True:
        return output
    else:
        raise Exception('return sequences didnt work')
cjmcmurtrie commented 9 years ago

In the Keras LSTM, return_sequences is handled this way:

    if self.return_sequences:
        return outputs.dimshuffle((1, 0, 2))
    return outputs[-1]

Are you sure the tensor doesn't need to be transposed? To plug into the other Keras modules?

Edit: Sorry actually I just realized you handled the transposition in get_forward_output and get_backward_output.

hycis commented 9 years ago

what you did is correct, since after dimension shuffle in get_forward_output and get_backward_output, output = (num_examples, num_seq, seq_len), so using output[:,-1,:] will take the last sequence.

cjmcmurtrie commented 9 years ago

Ok, it seems to be working. I'm using two stacked LSTMs like this:

model = Sequential()
model.add(BiDirectionLSTM(embedding_size, hidden_size, init=initialize))
model.add(Dense(hidden_size, hidden_size, init=initialize))
model.add(Activation('relu'))
model.add(RepeatVector(maxlen))
model.add(BiDirectionLSTM(hidden_size, hidden_size, return_sequences=True, init=initialize))
model.add(TimeDistributedDense(hidden_size, output_size, activation='softmax', init=initialize))

And it seems to be training ok (I'm actually trying to decide if I need two bi-dir LSTMs or just one at the input layer side).

I've forked (sorry a bit new to Github) but I can add the changes here if you like.

hycis commented 9 years ago

sure feel free to add the changes and create a pull request if you can