Open cjmcmurtrie opened 9 years ago
for now, the default is to return the sequence of outputs. If input is x_1, x_2, .. x_k, the output will be y_1, y_2, .. y_k. But if just want to take the the last output y_k, then for the get_output you can just take the last element of the sequence y=forward+backward; return y[:,-1,:]
Like this?
def get_output(self, train):
forward = self.get_forward_output(train)
backward = self.get_backward_output(train)
if self.output_mode is 'sum':
output = forward + backward
elif self.output_mode is 'concat':
output = T.concatenate([forward, backward], axis=2)
else:
raise Exception('output mode is not sum or concat')
if self.return_sequences==False:
return output[:,-1,:]
elif self.return_sequences==True:
return output
else:
raise Exception('return sequences didnt work')
In the Keras LSTM, return_sequences is handled this way:
if self.return_sequences:
return outputs.dimshuffle((1, 0, 2))
return outputs[-1]
Are you sure the tensor doesn't need to be transposed? To plug into the other Keras modules?
Edit: Sorry actually I just realized you handled the transposition in get_forward_output and get_backward_output.
what you did is correct, since after dimension shuffle in get_forward_output and get_backward_output, output = (num_examples, num_seq, seq_len), so using output[:,-1,:] will take the last sequence.
Ok, it seems to be working. I'm using two stacked LSTMs like this:
model = Sequential()
model.add(BiDirectionLSTM(embedding_size, hidden_size, init=initialize))
model.add(Dense(hidden_size, hidden_size, init=initialize))
model.add(Activation('relu'))
model.add(RepeatVector(maxlen))
model.add(BiDirectionLSTM(hidden_size, hidden_size, return_sequences=True, init=initialize))
model.add(TimeDistributedDense(hidden_size, output_size, activation='softmax', init=initialize))
And it seems to be training ok (I'm actually trying to decide if I need two bi-dir LSTMs or just one at the input layer side).
I've forked (sorry a bit new to Github) but I can add the changes here if you like.
sure feel free to add the changes and create a pull request if you can
Nice work, this looks very promising.
I was thinking of modifying it for something I'm doing since it doesn't seem to implement return_sequences so that you can return a whole sequence of outputs? Or am I missing something?
If you want to give me a head start, how do you imagine modifying this to return a sequence of outputs? :)