keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.71k stars 19.43k forks source link

Implementing the model in Sequence to Sequence Learning with Neural Networks #694

Closed sq6ra closed 7 years ago

sq6ra commented 9 years ago

Hello guys : )

I'm wondering is it possible to build encoder decoder model in the paper "Sequence to Sequence Learning with Neural Networks" with Keras ?

In the model, encoder's output (hidden), is used to init the hidden(h_0) of decoder, instead of as input to decoder. Based on my current understanding, model in the examples/addition_rnn.py is not same with the one above.

It would be greatly appreciated if someone has suggestions & ideas,Cheers

hugman commented 9 years ago

I have been looking for good solutions to implement encoder/decoder with Keras. It seems that it is not solved yet. ( else, please let me know )

I think @elanmart is the closest one ( who is eager to implement it and has some ideas on it ). This is is closely related to 'stateful RNN', so please check following issues.

443 for recurrent model and some API sketchy from @elanmart

620 for new recurrent container idea from @EderSantana

98 for real time rnn implementation discusstion

As far as I know, "stateful rnn" is planned as a next update item. check #426 from @fchollet

Smerity commented 9 years ago

Hi @sq6ra!

When I wrote the addition_rnn example I used a mix of ideas from the Sequence to Sequence Learning with Neural Networks and Learning to Execute papers. Whilst I tried to follow them as closely as possible there were limitations. Even with this, the architecture described in the addition_rnn example should work quite well for many tasks.

The two differences from the paper come via:

For the first issue, it shouldn't be too difficult to modify the existing RNN code to "steal" the hidden state from a different model when they're done reading it but I've not attempted such a thing. I intuitively feel that reading the output of the first RNN as repeated input to a second RNN isn't likely to be too hurtful, though I've no empirical evaluation to back that theory up^.

^ I remember reading in a paper where they performed average pooling of all the output steps of the first RNN and found that useful but can't recall the paper.

As mentioned in Sequence to Sequence Learning with Neural Networks, you could avoid having two seperate RNNs and instead use one (described in Figure 1) if you truly think that maintaining a single hidden state is beneficial.

Having two separate RNNs (a) increases the number of model parameters at negligible computational cost and (b) allows them to easily train the LSTM on multiple language pairs. (a) can be fixed by increasing the model parameter size and (b) is only an issue if you've the time and data for multiple languages.

elanmart commented 9 years ago

@sq6ra Currently there is no way to initialize the hidden state of an RNN becouse of the line: outputs_info=T.unbroadcast(alloc_zeros_matrix(X.shape[1], self.output_dim), 1),, which means that the hidden state only exists inside the scan.

@hugman I do indeed have some working code for the general form of recurrent models. I'm currently not at home though, so it'll have to wait.

One thing to mention is that "statefullness" is quite easy to implement, but there are other obstacles on the way.

sq6ra commented 9 years ago

Thanks for the feedback guys : D I have decided to implement it with theano, Cheers

sq6ra commented 9 years ago

@Smerity I'm working on the experiment that adopting machine translation "ways" to build auto-encoders one paper published recently did the similar thing, maybe you will got interested
-- A Hierarchical Neural Autoencoder for Paragraphs and Documents https://web.stanford.edu/~jurafsky/pubs/P15-1107.pdf

brainwater commented 9 years ago

I think the only thing left for keras to fully support sequence to sequence learning is to fix an exception with output masking #693, since it seems to only learn to output the same thing for everything without output masking, but works when the sequences are all of the same length.

hugman commented 9 years ago

I do indeed have some working code for the general form of recurrent models. I'm currently not at home though, so it'll have to wait.

@elanmart Could you share the mentioned code?

NickShahML commented 8 years ago

Very interested in this as well. Simon really helped in me in this thread recently. We discussed encoding and decoding for sequence to sequence (variable length sequence). I'm testing a few things right now but I'll submit a PR if I get something useful working! This thread might help you as well (towards the bottom)

https://github.com/fchollet/keras/issues/395

anjishnu commented 8 years ago

Anyone had any success with this?

dhruviitp commented 7 years ago

any updates for machine translation model using keras ?

laith85 commented 7 years ago

Hello guys when I run this code , I found this error : Traceback (most recent call last): File "seq2seq.py", line 30, in X, X_vocab_len, X_word_to_ix, X_ix_to_word, y, y_vocab_len, y_word_to_ix, y_ix_to_word = load_data('ALG_trainig2.txt', 'MSA_training2.txt', MAX_LEN, VOCAB_SIZE) ValueError: need more than 4 values to unpack root@9cba51539a94:/home/laith/deeplearning_tests/mt_project/test_seq#

from future import print_function from keras.preprocessing.sequence import pad_sequences import numpy as np import sys

import argparse from seq2seq_utils import *

ap = argparse.ArgumentParser() ap.add_argument('-max_len', type=int, default=200) ap.add_argument('-vocab_size', type=int, default=20000) ap.add_argument('-batch_size', type=int, default=100) ap.add_argument('-layer_num', type=int, default=3) ap.add_argument('-hidden_dim', type=int, default=1000) ap.add_argument('-nb_epoch', type=int, default=2) ap.add_argument('-mode', default='train') args = vars(ap.parse_args())

MAX_LEN = args['max_len'] VOCAB_SIZE = args['vocab_size'] BATCH_SIZE = args['batch_size'] LAYER_NUM = args['layer_num'] HIDDEN_DIM = args['hidden_dim'] NB_EPOCH = args['nb_epoch'] MODE = args['mode']

if name == 'main':

Loading input sequences, output sequences and the necessary mapping dictionaries

print('[INFO] Loading data...')
X, X_vocab_len, X_word_to_ix, X_ix_to_word, y, y_vocab_len, y_word_to_ix, y_ix_to_word = load_data('ALG_trainig2.txt', 'MSA_training2.txt', MAX_LEN, VOCAB_SIZE)

# Finding the length of the longest sequence
X_max_len = max([len(sentence) for sentence in X])
y_max_len = max([len(sentence) for sentence in y])

# Padding zeros to make all sequences have a same length with the longest one
print('[INFO] Zero padding...')
X = pad_sequences(X, maxlen=X_max_len, dtype='int32')
y = pad_sequences(y, maxlen=y_max_len, dtype='int32')

# Creating the network model
print('[INFO] Compiling model...')
model = create_model(X_vocab_len, X_max_len, y_vocab_len, y_max_len, HIDDEN_DIM, LAYER_NUM)

# Finding trained weights of previous epoch if any
saved_weights = find_checkpoint_file('/home/laith/deeplearning_tests/mt_project/test_seq/')

# Training only if we chose training mode
if MODE == 'train':
    k_start = 1

    # If any trained weight was found, then load them into the model
    if len(saved_weights) != 0:
        print('[INFO] Saved weights found, loading...')
        epoch = saved_weights[saved_weights.rfind('')+1:saved_weights.rfind('')]
        model.load_weights(saved_weights)
        k_start = int(epoch) + 1

    i_end = 0
    for k in range(k_start, NB_EPOCH+1):
        # Shuffling the training data every epoch to avoid local minima
        indices = np.arange(len(X))
        np.random.shuffle(indices)
        X = X[indices]
        y = y[indices]

        # Training 1000 sequences at a time
        for i in range(0, len(X), 1000):
            if i + 1000 >= len(X):
                i_end = len(X)
            else:
                i_end = i + 1000
            y_sequences = process_data(y[i:i_end], y_max_len, y_word_to_ix)

            print('[INFO] Training model: epoch {}th {}/{} samples'.format(k, i, len(X)))
            model.fit(X[i:i_end], y_sequences, batch_size=BATCH_SIZE, epochs=1, verbose=2)
        model.save_weights('checkpoint_epoch_{}.hdf5'.format(k))

# Performing test if we chose test mode
else:
    # Only performing test if there is any saved weights
    if len(saved_weights) == 0:
        print("The network hasn't been trained! Program will exit...")
        sys.exit()
    else:
        X_test = load_test_data(' ALG_testing2.txt', X_word_to_ix, MAX_LEN)
        X_test = pad_sequences(X_test, maxlen=X_max_len, dtype='int32')
        model.load_weights(saved_weights)

predictions = np.argmax(model.predict(X_test), axis=2) sequences = [] for prediction in predictions: sequence = ' '.join([y_ix_to_word(index) for index in prediction if index > 0]) print(sequence) sequences.append(sequence) np.savetxt('test_result.txt', sequences, fmt='%s')