keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
62.05k stars 19.48k forks source link

An LSTM model to return probability distribution from the softmax for every timestamp in the sequence #5262

Closed dupsys closed 7 years ago

dupsys commented 7 years ago

Building our model using an LSTM Recurrent Network

print('Building training model...') hiddenStateSize = 256 hiddenLayerSize = 256 timesteps = 8

model = Sequential() model.add(LSTM(hiddenStateSize, return_sequences = True, stateful=True, batch_input_shape=(maxSequenceLength, len(char_2_id),1),forget_bias_init='one')) model.add(Dropout(0.3)) model.add(TimeDistributed(Dense(hiddenLayerSize))) model.add(TimeDistributed(Activation('relu')))

model.add(TimeDistributed(Dense(len(char_2_id)))) model.add(TimeDistributed(Activation('softmax')))

%time model.compile(loss='categorical_crossentropy', optimizer = RMSprop(lr=0.001), metrics=['accuracy'])

Print the details about the network model

print(model.summary())

Test a simple prediction on a batch for this model.

print("Sample input Batch size:"), print(inputChars[0:32, :, :].shape) print("Sample input Batch labels (nextChars):"), print(nextChars[0:32, :, :].shape) outputs = model.predict(inputChars[0:32, :, :]) print("Output Sequence size:"), print(outputs.shape)

I got this error: ValueError: Error when checking : expected lstm_input_3 to have shape (173, 38, 1) but got array with shape (32, 173, 38)

kentsommer commented 7 years ago

Hey @dupsys,

This is simply an error saying the input shape to your model is different than the shape you specify in the model itself.

This line batch_input_shape=(maxSequenceLength, len(char_2_id),1) will, in your case, end up with the shape (173, 38, 1). Note here that because you use batch_input_shape, you are saying the batch_size will be 173 and the data will be in the shape (38,1).

However, the input you are trying to give the network is (32, 173, 38). This is a batch size of 32.

I believe what you want to do unless I'm misunderstanding your task is the following:

dupsys commented 7 years ago

Hi @kentsommer, Thank you so much for the response. I am completely new in this environment but i will catch up very soon.

I insert the batch_input_shape as suggested but get an error saying:

If an RNN is stateful, a complete input_shape must be provided (including batch size).

Building our model using an LSTM Recurrent Network

We build our model using the recurrent neural network (RNN) using keras libraries. The input and the output of

the network is characters (one-hot-encoding) in the following format:

batch_size:

maxSequenceLength:

charVoc:

The output doesn't contain the one-hot-decoding of the input but a probability distribution from the softmax for

every timestamp in the sequence

print('Building training model...') hiddenStateSize = 256 hiddenLayerSize = 256 timesteps = 8

defining the layer after another layer and in keras is called Sequential modelling

model = Sequential()

The output of the LSTM layer are the hidden states of the LSTM for every time step.

model.add(LSTM(hiddenStateSize, return_sequences=True, stateful=True, batch_input_shape=(None, maxSequenceLength, len(char_2_id)),forget_bias_init='one')) model.add(Dropout(0.3)) model.add(TimeDistributed(Dense(hiddenLayerSize))) model.add(TimeDistributed(Activation('relu')))

Add another dense layer with the desired output size.

model.add(TimeDistributed(Dense(len(char_2_id)))) model.add(TimeDistributed(Activation('softmax')))

Optimization method, in this case we use RMSprop with learning rate 0.001.

RMSprop is commonly used for RNNs instead of regular SGD, while categorical_crossentropy

is the same loss used for classification problems using softmax.

%time model.compile(loss='categorical_crossentropy', optimizer = RMSprop(lr=0.001), metrics=['accuracy'])

Print the details about the network model

print(model.summary())

Test a simple prediction on a batch for this model.

print("Sample input Batch size:"), print(inputChars[0:32, :, :].shape) print("Sample input Batch labels (nextChars):"), print(nextChars[0:32, :, :].shape) outputs = model.predict(inputChars[0:32, :, :]) print("Output Sequence size:"), print(outputs.shape)

kentsommer commented 7 years ago

@dupsys

Ah sorry, I forgot about that. Then because you are feeding in a batch_size of 32, you should just need to set it to: batch_input_shape=(32, maxSequenceLength, len(char_2_id))

dupsys commented 7 years ago

Then, what will be my batch_size input in my fitting model because i have 10000 samples:

h = model.fit(inputChars, nextChars, batch_size = 32, nb_epoch = 50, show_accuracy=True, verbose=1)

because, i have this error: In a stateful network, you should only pass inputs with a number of samples that can be divided by the batch size. Found: 10000 samples

kentsommer commented 7 years ago

@dupsys

I suggest you take a deeper look at the documentation from which you are pulling your code:

http://www.cs.virginia.edu/~vicente/recognition/notebooks/kerasLSTM.html

Specifically section 5.

dupsys commented 7 years ago

@kentsommer

okay, thanks.

dupsys commented 7 years ago

Hi @kentsommer,

I am trying to implement this paper https://arxiv.org/pdf/1605.03481.pdf with the following code.

`# define the symbol alphabet, including the padding symbol '#' and create mapping of unique chars to integers chars = 'X'+ ''.join(sorted({c for s in raw_text for c in s})) char_to_int = {c:i for i,c in enumerate(chars)}

compute the size of our alphabet e.g.,length of vocabulary (letters) size in the training dataset

n_chars = len(raw_text)

maximum length of distinct characters in the vocabulary much more than 26 in the alphabet.

max_length = max(map(len, raw_text.split()))

decoding and encoding

def encode_string(s): return [char_to_int[c] for c in s] def decode_string(a): return ''.join(chars[i] for i in a) def encode_and_pad(words): return pad_sequences(list(map(encode_string, words)), padding='post', maxlen=max_length)

Our inputs (x) will be the participles

padded_input = encode_and_pad(chars)

load the dataset but only keep the top n words, zero the rest

X_train, X_test = train_test_split(padded_input, test_size=0.20, random_state=seed) X_train = sequence.pad_sequences(X_train, maxlen=max_length) X_test = sequence.pad_sequences(X_test, maxlen=max_length)

model construction

sequence_length = X_train.shape[1] vocabulary_size = len(chars) embedding_dim = 256 decoder_size = 128

filter_sizes = [7,3]

num_filters = 256 drop = 0.5

nb_epoch = 10 batch_size = 5

we define what the input shape looks like:

inputs = Input(shape=(sequence_length, vocabulary_size), name='input', dtype='float32')

all the convolutional layers...

conv = Convolution1D(num_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu',input_shape=(sequence_length, vocabulary_size))(inputs) pool = MaxPooling1D(pool_length= filter_sizes[1])(conv) conv1 = Convolution1D(num_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu')(pool) pool2 = MaxPooling1D(pool_length= filter_sizes[1])(conv1) conv3 = Convolution1D(num_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu')(pool2) conv4 = Convolution1D(num_filters, filter_length=filter_sizes[0], border_mode='valid', activation='relu')(conv3)

encode with LSTM

encoding_layer = LSTM(conv4, embedding_dim, return_sequences=True) rep_in = RepeatVector(encoding_layer) l_decoder_1 = LSTM(rep_in, decoder_size, name='decoder1') l_decoder_2 = LSTM(l_decoder_1, decoder_size, name='decoder2') l_reshape = Reshape(l_decoder_2, (-1,[2]), name='l_reshape')`

The objective is to reshape decoders LSTM to define the actual input text or synonym replacement of the early text. Running this code, i got the following:


TypeError Traceback (most recent call last)

in () 55 l_decoder_1 = LSTM(rep_in, decoder_size, name='decoder1') 56 l_decoder_2 = LSTM(l_decoder_1, decoder_size, name='decoder2') ---> 57 l_reshape = Reshape(l_decoder_2, (-1,[2]), name='l_reshape') TypeError: __init__() takes exactly 2 arguments (4 given)
stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.