wanasit / katakana

Training machine to write Katakana using Sequence-to-Sequence technique
66 stars 18 forks source link

How to add bidirectional layer? #1

Closed xun468 closed 5 years ago

xun468 commented 6 years ago

Hey! I've been playing around with your model and I'd like to modify the LSTM encoder into a bidirectional LSTM.

encoder_input = Input(shape=(input_length,))
decoder_input = Input(shape=(output_length,))

encoder = Embedding(input_dict_size, 64, input_length=input_length, mask_zero=True)(encoder_input)
encoder = Bidirectional(LSTM(UNITS, return_sequences=True))(encoder)

decoder = Embedding(output_dict_size, 64, input_length=output_length, mask_zero=True)(decoder_input)
decoder = LSTM(UNITS*2, return_sequences=True)(decoder, initial_state=[encoder])
decoder = TimeDistributed(Dense(output_dict_size, activation="softmax"))(decoder)

model = Model(inputs=[encoder_input, decoder_input], outputs=[decoder])
model.compile(optimizer='adam', loss='categorical_crossentropy')

return model

I am getting the error

ValueError: An initial_state was passed that is not compatible with cell.state_size. Received state_spec=[InputSpec(shape=(None, 50, 128), ndim=3)]; however cell.state_size is (128, 128)

However when I try initial_state=[encoder,encoder]) I get a very long error that ends in a shape mismatch. If it is not too much trouble, could I have your thoughts on how to properly implement this?

wanasit commented 5 years ago

I apologize for the very slow response. I hope you have solved the problem.

The problematic line in your code is return_sequences=True. This makes the encoder's result be a sequence instead of a single vector.

You can check it by:

encoder = Embedding(input_dict_size, 64, input_length=INPUT_LENGTH, mask_zero=True)(encoder_input)
encoder = Bidirectional(LSTM(64, return_sequences=True))(encoder)
print(encoder.get_shape()) # => (?, ?, 128)

encoder = Embedding(input_dict_size, 64, input_length=INPUT_LENGTH, mask_zero=True)(encoder_input)
encoder = Bidirectional(LSTM(64,))(encoder)
print(encoder.get_shape()) # => (?, 128)

This following code work for me:

encoder = Embedding(input_dict_size, 64, input_length=INPUT_LENGTH, mask_zero=True)(encoder_input)
encoder = Bidirectional(LSTM(64,))(encoder)

decoder = Embedding(output_dict_size, 64, input_length=OUTPUT_LENGTH, mask_zero=True)(decoder_input)
decoder = LSTM(128, return_sequences=True)(decoder, initial_state=[encoder, encoder])
decoder = TimeDistributed(Dense(output_dict_size, activation="softmax"))(decoder)