wanasit / katakana

Training machine to write Katakana using Sequence-to-Sequence technique
66 stars 18 forks source link

If you need advise how to train your model #5

Open philipperemy opened 5 years ago

philipperemy commented 5 years ago

Ping me

Fredrum commented 4 years ago

Ok! :) @philipperemy

I'm trying to replicate the web article on turning this code into a Attention demonstration. https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html

I think I have done the right things but when running the fit I get the error message:

ValueError: Error when checking input: expected Inp_dec to have 2 dimensions, but got array with shape (103400, 20, 65)

this is the tail end of my code:

# We are predicting the next character.
# Thus, the decode’s input is the expected output shifted by START char
training_decoder_input = np.zeros_like(training_decoder_output)
training_decoder_input[:, 1:] = training_decoder_output[:,:-1]
training_decoder_input[:, 0] = encoding.CHAR_CODE_START

training_decoder_output = np.eye(output_dict_size)[encoded_training_output.astype('int')]

# model.fit(x=[training_encoder_input, training_decoder_input], y=[training_decoder_output], …)
# training_decoder_input expects 2 dimensions, but got array with shape (103400, 20, 65)

print("Enc Shape: {}".format(training_encoder_input.shape))  # Enc Shape: (103400, 20)
print("Dec Shape: {}".format(training_decoder_input.shape))  # Dec Shape: (103400, 20, 65)

seq2seq_model.fit(
    x=[training_encoder_input, training_decoder_input],
    y=[training_decoder_output],
    validation_data=(
        [validation_encoder_input, validation_decoder_input], [validation_decoder_output]),
    verbose=2,
    batch_size=64,
    epochs=30)

Any ideas what I'm missing?

Cheers Fred

wanasit commented 4 years ago

Hmm. I'd need to see how did you build the model to know why it expects a different input dimension.

Also, I wrote that article a while ago. Could you try following the updated version in Medium instead? https://medium.com/@wanasit/english-to-katakana-with-sequence-to-sequence-in-tensorflow-a03a16ac19be

Fredrum commented 4 years ago

Thanks so much for replying!

I modified the model.py file the way I understood it from the web page. Of course I'm a beginner at this so might have missed some obvious things. This is the function that assembles the model:

(I will take a look at your updated article and see if I can spot what to do) UPDATE: I had a look at your new article but it didn't seem to feature the Attention part, which was the specific bit that I was interested in trying.

def create_model(
        input_dict_size,
        output_dict_size,
        input_length=DEFAULT_INPUT_LENGTH,
        output_length=DEFAULT_OUTPUT_LENGTH):

    encoder_input = Input(shape=(input_length,), name="Inp_enc")
    decoder_input = Input(shape=(output_length,), name="Inp_dec")

    encoder = Embedding(input_dict_size, 64, input_length=input_length, mask_zero=True)(encoder_input)
    encoder = LSTM(64, return_sequences=True)(encoder)  # WAS: False,   could use unroll=True
    encoder_last = encoder[:, -1, :]  # ATTENTION add

    decoder = Embedding(output_dict_size, 64, input_length=output_length, mask_zero=True)(decoder_input)
    decoder = LSTM(64, return_sequences=True)(decoder, initial_state=[encoder_last, encoder_last])  # WAS:  initial_state=[encoder, encoder]

    # Here comes Attention bits from:
    # https://wanasit.github.io/attention-based-sequence-to-sequence-in-keras.html

    attention = dot([decoder, encoder], axes=[2, 2])
    attention = Activation('softmax')(attention)

    context = dot([attention, encoder], axes=[2, 1])
    decoder_combined_context = concatenate([context, decoder])

    # Has another weight + tanh layer as described in equation (5) of the paper
    output = TimeDistributed(Dense(64, activation="tanh"))(decoder_combined_context)  # equation (5) of the paper
    output = TimeDistributed(Dense(output_dict_size, activation="softmax"))(output)  # equation (6) of the paper

    # Final Model
    model = Model(inputs=[encoder_input, decoder_input], outputs=[output])  # Was:  outputs=[decoder]
    model.compile(optimizer='adam', loss='binary_crossentropy')

    return model