Open philipperemy opened 5 years ago
Ok! :) @philipperemy
I'm trying to replicate the web article on turning this code into a Attention demonstration. https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html
I think I have done the right things but when running the fit I get the error message:
ValueError: Error when checking input: expected Inp_dec to have 2 dimensions, but got array with shape (103400, 20, 65)
this is the tail end of my code:
# We are predicting the next character.
# Thus, the decode’s input is the expected output shifted by START char
training_decoder_input = np.zeros_like(training_decoder_output)
training_decoder_input[:, 1:] = training_decoder_output[:,:-1]
training_decoder_input[:, 0] = encoding.CHAR_CODE_START
training_decoder_output = np.eye(output_dict_size)[encoded_training_output.astype('int')]
# model.fit(x=[training_encoder_input, training_decoder_input], y=[training_decoder_output], …)
# training_decoder_input expects 2 dimensions, but got array with shape (103400, 20, 65)
print("Enc Shape: {}".format(training_encoder_input.shape)) # Enc Shape: (103400, 20)
print("Dec Shape: {}".format(training_decoder_input.shape)) # Dec Shape: (103400, 20, 65)
seq2seq_model.fit(
x=[training_encoder_input, training_decoder_input],
y=[training_decoder_output],
validation_data=(
[validation_encoder_input, validation_decoder_input], [validation_decoder_output]),
verbose=2,
batch_size=64,
epochs=30)
Any ideas what I'm missing?
Cheers Fred
Hmm. I'd need to see how did you build the model
to know why it expects a different input dimension.
Also, I wrote that article a while ago. Could you try following the updated version in Medium instead? https://medium.com/@wanasit/english-to-katakana-with-sequence-to-sequence-in-tensorflow-a03a16ac19be
Thanks so much for replying!
I modified the model.py file the way I understood it from the web page. Of course I'm a beginner at this so might have missed some obvious things. This is the function that assembles the model:
(I will take a look at your updated article and see if I can spot what to do) UPDATE: I had a look at your new article but it didn't seem to feature the Attention part, which was the specific bit that I was interested in trying.
def create_model(
input_dict_size,
output_dict_size,
input_length=DEFAULT_INPUT_LENGTH,
output_length=DEFAULT_OUTPUT_LENGTH):
encoder_input = Input(shape=(input_length,), name="Inp_enc")
decoder_input = Input(shape=(output_length,), name="Inp_dec")
encoder = Embedding(input_dict_size, 64, input_length=input_length, mask_zero=True)(encoder_input)
encoder = LSTM(64, return_sequences=True)(encoder) # WAS: False, could use unroll=True
encoder_last = encoder[:, -1, :] # ATTENTION add
decoder = Embedding(output_dict_size, 64, input_length=output_length, mask_zero=True)(decoder_input)
decoder = LSTM(64, return_sequences=True)(decoder, initial_state=[encoder_last, encoder_last]) # WAS: initial_state=[encoder, encoder]
# Here comes Attention bits from:
# https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html
attention = dot([decoder, encoder], axes=[2, 2])
attention = Activation('softmax')(attention)
context = dot([attention, encoder], axes=[2, 1])
decoder_combined_context = concatenate([context, decoder])
# Has another weight + tanh layer as described in equation (5) of the paper
output = TimeDistributed(Dense(64, activation="tanh"))(decoder_combined_context) # equation (5) of the paper
output = TimeDistributed(Dense(output_dict_size, activation="softmax"))(output) # equation (6) of the paper
# Final Model
model = Model(inputs=[encoder_input, decoder_input], outputs=[output]) # Was: outputs=[decoder]
model.compile(optimizer='adam', loss='binary_crossentropy')
return model
Ping me