Closed twangnh closed 7 years ago
Could you clarify your particular question?
I mean, in the training of LSTM or RNN, we use the standard forward and backword propogation, in each forward and backward propagation, the training input X and the target Y are two sequeces of the same length, but in keras example of lstm for text generation, in the model.fit() function, for each pair of data, the X is a sequence of letters, but the Y is only a letter, so I'm confused about this, I don't know if it works as the standard way. Thank you
The RNN (a LSTM is a specific type of RNN) in the example outputs the last timestep into a fully-connected layer with as many outputs as there are distinct characters, on which it performs a softmax. The loss is categorical so this is a classification problem, essentially mapping the input text sequence to the most likeliest following character. Makes sense?
Also, please use code highlighting when posting examples.
Thank you, but I mean the way keras calculate the forward and backpropagation, just look an example, here is a standard rnn code it's written by Andrej karpathy, you can find it on github: **`""" Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy) BSD License """ import numpy as np
data = open('yahoo.txt', 'r').read() # should be simple plain text file chars = list(set(data)) data_size, vocab_size = len(data), len(chars) print 'data has %d characters, %d unique.' % (data_size, vocab_size) char_to_ix = { ch:i for i,ch in enumerate(chars) } ix_to_char = { i:ch for i,ch in enumerate(chars) }
hidden_size = 100 # size of hidden layer of neurons seq_length = 40 # number of steps to unroll the RNN for learning_rate = 1e-1
Wxh = np.random.randn(hidden_size, vocab_size)_0.01 # input to hidden Whh = np.random.randn(hidden_size, hidden_size)_0.01 # hidden to hidden Why = np.random.randn(vocab_size, hidden_size)*0.01 # hidden to output bh = np.zeros((hidden_size, 1)) # hidden bias by = np.zeros(( vocab_size, 1)) # output bias
def lossFun(inputs, targets, hprev): """ inputs,targets are both list of integers. hprev is Hx1 array of initial hidden state returns the loss, gradients on model parameters, and last hidden state """ xs, hs, ys, ps = {}, {}, {}, {} hs[-1] = np.copy(hprev) loss = 0
for t in xrange(len(inputs)): xs[t] = np.zeros((vocab_size,1)) # encode in 1-of-k representation xs[t][inputs[t]] = 1 hs[t] = np.tanh(np.dot(Wxh, xs[t]) + np.dot(Whh, hs[t-1]) + bh) # hidden state ys[t] = np.dot(Why, hs[t]) + by # unnormalized log probabilities for next chars ps[t] = np.exp(ys[t]) / np.sum(np.exp(ys[t])) # probabilities for next chars loss += -np.log(ps[t][targets[t],0]) # softmax (cross-entropy loss)
dWxh, dWhh, dWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why) dbh, dby = np.zeros_like(bh), np.zeros_like(by) dhnext = np.zeros_like(hs[0]) for t in reversed(xrange(len(inputs))): dy = np.copy(ps[t]) dy[targets[t]] -= 1 # backprop into y dWhy += np.dot(dy, hs[t].T) dby += dy dh = np.dot(Why.T, dy) + dhnext # backprop into h dhraw = (1 - hs[t] * hs[t]) * dh # backprop through tanh nonlinearity dbh += dhraw dWxh += np.dot(dhraw, xs[t].T) dWhh += np.dot(dhraw, hs[t-1].T) dhnext = np.dot(Whh.T, dhraw) for dparam in [dWxh, dWhh, dWhy, dbh, dby]: np.clip(dparam, -5, 5, out=dparam) # clip to mitigate exploding gradients return loss, dWxh, dWhh, dWhy, dbh, dby, hs[len(inputs)-1]
def sample(h, seed_ix, n): """ sample a sequence of integers from the model h is memory state, seed_ix is seed letter for first time step """ x = np.zeros((vocab_size, 1)) x[seed_ix] = 1 ixes = [] for t in xrange(n): h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh) y = np.dot(Why, h) + by p = np.exp(y) / np.sum(np.exp(y)) ix = np.random.choice(range(vocab_size), p=p.ravel()) x = np.zeros((vocab_size, 1)) x[ix] = 1 ixes.append(ix) return ixes
n, p = 0, 0 mWxh, mWhh, mWhy = np.zeros_like(Wxh), np.zeros_like(Whh), np.zeros_like(Why) mbh, mby = np.zeros_like(bh), np.zeros_like(by) # memory variables for Adagrad smooth_loss = -np.log(1.0/vocab_size)*seq_length # loss at iteration 0 while True:
if p+seq_length+1 >= len(data) or n == 0: hprev = np.zeros((hidden_size,1)) # reset RNN memory p = 0 # go from start of data v=np.random.randint(p,p+seq_length) inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]] targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]
if n % 100 == 0: sample_ix = sample(hprev, inputs[v], 200) txt = ''.join(ix_to_char[ix] for ix in sample_ix) print '----\n %s \n----' % (txt, )
loss, dWxh, dWhh, dWhy, dbh, dby, hprev = lossFun(inputs, targets, hprev) smooth_loss = smooth_loss * 0.999 + loss * 0.001 if n % 100 == 0: print 'iter %d, loss: %f' % (n, loss) # print progress
for param, dparam, mem in zip([Wxh, Whh, Why, bh, by], [dWxh, dWhh, dWhy, dbh, dby], [mWxh, mWhh, mWhy, mbh, mby]): mem += dparam * dparam param += -learning_rate * dparam / np.sqrt(mem + 1e-8) # adagrad update
p += seq_length # move data pointer n += 1 # iteration counter\
` as you can see, the training pairs is :
inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]] targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]
but in keras model.fit(). for each pair of data, the X is a sequence of letters, but the Y is only a letter,I don't know if keras does the same way for forward and backpropagation, and how? Thank you!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
Hi! Recently I'm using the keras LSTM, but I'm puzzled about the implementation detail of model.fit() function, how does it work for training on LSTM, let me put it concretely: In the official example of LSTM for text generation below:
`'''Example script to generate text from Nietzsche's writings. At least 20 epochs are required before the generated text starts sounding coherent. It is recommended to run this script on GPU, as recurrent networks are quite computationally intensive. If you try this script on new data, make sure your corpus has at least ~100k characters. ~1M is better. '''
from future import print_function from keras.models import Sequential from keras.layers.core import Dense, Activation, Dropout from keras.layers.recurrent import LSTM
from keras.utils.data_utils import get_file
import numpy as np import random import sys
path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open("C:\SciSoft\my_code\train.txt").read().lower() print('corpus length:', len(text))
chars = set(text) print('total chars:', len(chars)) char_indices = dict((c, i) for i, c in enumerate(chars)) indices_char = dict((i, c) for i, c in enumerate(chars))
cut the text in semi-redundant sequences of maxlen characters
maxlen = 40 step = 3 sentences = [] next_chars = [] for i in range(0, len(text) - maxlen, step): sentences.append(text[i: i + maxlen]) next_chars.append(text[i + maxlen]) print('nb sequences:', len(sentences))
print('Vectorization...') X = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool) y = np.zeros((len(sentences), len(chars)), dtype=np.bool) for i, sentence in enumerate(sentences): for t, char in enumerate(sentence): X[i, t, char_indices[char]] = 1 y[i, char_indices[next_chars[i]]] = 1
build the model: 2 stacked LSTM
print('Build model...') model = Sequential() model.add(LSTM(200, return_sequences=False, input_shape=(maxlen, len(chars))))
model.add(Dropout(0.2))
model.add(LSTM(512, return_sequences=False))
model.add(Dropout(0.2)) model.add(Dense(len(chars))) model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
def sample(a, temperature=1.0):
helper function to sample an index from a probability array
train the model, output generated text after each iteration
for iteration in range(1, 60): print() print('-' * 50) print('Iteration', iteration) model.fit(X, y, batch_size=128, nb_epoch=1)
there is "model.fit(X, y, batch_size=128, nb_epoch=1) ", X is sentences, y is the next char corresponding to each sentences of X, but in standard backpropagation, the target y is a sequence as long as the input X, like I showed below:(it's from character-level rnn by Andrej karpathy)
inputs = [char_to_ix[ch] for ch in data[p:p+seq_length]] targets = [char_to_ix[ch] for ch in data[p+1:p+seq_length+1]]
so how does keras model.fit() work for rnn training ? hope someone can help me, thanks in advance!