Closed sq6ra closed 7 years ago
I have been looking for good solutions to implement encoder/decoder with Keras. It seems that it is not solved yet. ( else, please let me know )
I think @elanmart is the closest one ( who is eager to implement it and has some ideas on it ). This is is closely related to 'stateful RNN', so please check following issues.
As far as I know, "stateful rnn" is planned as a next update item. check #426 from @fchollet
Hi @sq6ra!
When I wrote the addition_rnn
example I used a mix of ideas from the Sequence to Sequence Learning with Neural Networks and Learning to Execute papers. Whilst I tried to follow them as closely as possible there were limitations. Even with this, the architecture described in the addition_rnn
example should work quite well for many tasks.
The two differences from the paper come via:
For the first issue, it shouldn't be too difficult to modify the existing RNN code to "steal" the hidden state from a different model when they're done reading it but I've not attempted such a thing. I intuitively feel that reading the output of the first RNN as repeated input to a second RNN isn't likely to be too hurtful, though I've no empirical evaluation to back that theory up^.
^ I remember reading in a paper where they performed average pooling of all the output steps of the first RNN and found that useful but can't recall the paper.
As mentioned in Sequence to Sequence Learning with Neural Networks, you could avoid having two seperate RNNs and instead use one (described in Figure 1) if you truly think that maintaining a single hidden state is beneficial.
Having two separate RNNs (a) increases the number of model parameters at negligible computational cost and (b) allows them to easily train the LSTM on multiple language pairs. (a) can be fixed by increasing the model parameter size and (b) is only an issue if you've the time and data for multiple languages.
@sq6ra Currently there is no way to initialize the hidden state of an RNN becouse of the line: outputs_info=T.unbroadcast(alloc_zeros_matrix(X.shape[1], self.output_dim), 1),
, which means that the hidden state only exists inside the scan
.
@hugman I do indeed have some working code for the general form of recurrent models. I'm currently not at home though, so it'll have to wait.
One thing to mention is that "statefullness" is quite easy to implement, but there are other obstacles on the way.
Thanks for the feedback guys : D I have decided to implement it with theano, Cheers
@Smerity
I'm working on the experiment that adopting machine translation "ways" to build auto-encoders
one paper published recently did the similar thing, maybe you will got interested
-- A Hierarchical Neural Autoencoder for Paragraphs and Documents
https://web.stanford.edu/~jurafsky/pubs/P15-1107.pdf
I think the only thing left for keras to fully support sequence to sequence learning is to fix an exception with output masking #693, since it seems to only learn to output the same thing for everything without output masking, but works when the sequences are all of the same length.
I do indeed have some working code for the general form of recurrent models. I'm currently not at home though, so it'll have to wait.
@elanmart Could you share the mentioned code?
Very interested in this as well. Simon really helped in me in this thread recently. We discussed encoding and decoding for sequence to sequence (variable length sequence). I'm testing a few things right now but I'll submit a PR if I get something useful working! This thread might help you as well (towards the bottom)
Anyone had any success with this?
any updates for machine translation model using keras ?
Hello guys when I run this code , I found this error :
Traceback (most recent call last):
File "seq2seq.py", line 30, in
from future import print_function from keras.preprocessing.sequence import pad_sequences import numpy as np import sys
import argparse from seq2seq_utils import *
ap = argparse.ArgumentParser() ap.add_argument('-max_len', type=int, default=200) ap.add_argument('-vocab_size', type=int, default=20000) ap.add_argument('-batch_size', type=int, default=100) ap.add_argument('-layer_num', type=int, default=3) ap.add_argument('-hidden_dim', type=int, default=1000) ap.add_argument('-nb_epoch', type=int, default=2) ap.add_argument('-mode', default='train') args = vars(ap.parse_args())
MAX_LEN = args['max_len'] VOCAB_SIZE = args['vocab_size'] BATCH_SIZE = args['batch_size'] LAYER_NUM = args['layer_num'] HIDDEN_DIM = args['hidden_dim'] NB_EPOCH = args['nb_epoch'] MODE = args['mode']
if name == 'main':
print('[INFO] Loading data...')
X, X_vocab_len, X_word_to_ix, X_ix_to_word, y, y_vocab_len, y_word_to_ix, y_ix_to_word = load_data('ALG_trainig2.txt', 'MSA_training2.txt', MAX_LEN, VOCAB_SIZE)
# Finding the length of the longest sequence
X_max_len = max([len(sentence) for sentence in X])
y_max_len = max([len(sentence) for sentence in y])
# Padding zeros to make all sequences have a same length with the longest one
print('[INFO] Zero padding...')
X = pad_sequences(X, maxlen=X_max_len, dtype='int32')
y = pad_sequences(y, maxlen=y_max_len, dtype='int32')
# Creating the network model
print('[INFO] Compiling model...')
model = create_model(X_vocab_len, X_max_len, y_vocab_len, y_max_len, HIDDEN_DIM, LAYER_NUM)
# Finding trained weights of previous epoch if any
saved_weights = find_checkpoint_file('/home/laith/deeplearning_tests/mt_project/test_seq/')
# Training only if we chose training mode
if MODE == 'train':
k_start = 1
# If any trained weight was found, then load them into the model
if len(saved_weights) != 0:
print('[INFO] Saved weights found, loading...')
epoch = saved_weights[saved_weights.rfind('')+1:saved_weights.rfind('')]
model.load_weights(saved_weights)
k_start = int(epoch) + 1
i_end = 0
for k in range(k_start, NB_EPOCH+1):
# Shuffling the training data every epoch to avoid local minima
indices = np.arange(len(X))
np.random.shuffle(indices)
X = X[indices]
y = y[indices]
# Training 1000 sequences at a time
for i in range(0, len(X), 1000):
if i + 1000 >= len(X):
i_end = len(X)
else:
i_end = i + 1000
y_sequences = process_data(y[i:i_end], y_max_len, y_word_to_ix)
print('[INFO] Training model: epoch {}th {}/{} samples'.format(k, i, len(X)))
model.fit(X[i:i_end], y_sequences, batch_size=BATCH_SIZE, epochs=1, verbose=2)
model.save_weights('checkpoint_epoch_{}.hdf5'.format(k))
# Performing test if we chose test mode
else:
# Only performing test if there is any saved weights
if len(saved_weights) == 0:
print("The network hasn't been trained! Program will exit...")
sys.exit()
else:
X_test = load_test_data(' ALG_testing2.txt', X_word_to_ix, MAX_LEN)
X_test = pad_sequences(X_test, maxlen=X_max_len, dtype='int32')
model.load_weights(saved_weights)
predictions = np.argmax(model.predict(X_test), axis=2) sequences = [] for prediction in predictions: sequence = ' '.join([y_ix_to_word(index) for index in prediction if index > 0]) print(sequence) sequences.append(sequence) np.savetxt('test_result.txt', sequences, fmt='%s')
Hello guys : )
I'm wondering is it possible to build encoder decoder model in the paper "Sequence to Sequence Learning with Neural Networks" with Keras ?
In the model, encoder's output (hidden), is used to init the hidden(h_0) of decoder, instead of as input to decoder. Based on my current understanding, model in the examples/addition_rnn.py is not same with the one above.
It would be greatly appreciated if someone has suggestions & ideas,Cheers