keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.94k stars 19.46k forks source link

Add a character-based RNN example. #197

Closed bskaggs closed 9 years ago

bskaggs commented 9 years ago

@karpathy's character-based RNN in Torch has gotten a great deal of attention recently after his blog post entitled "The Unreasonable Effectiveness of Recurrent Neural Networks".

It would be nice if there was an example of doing the same thing in Keras.

fchollet commented 9 years ago

Completely agreed. It's possible in Keras but relatively inefficient due to the way our LSTM/GRU layers work right now.

It's on the backlog.

ganarajpr commented 9 years ago

@fchollet Can you explain a bit more as to why it would be inefficient ? I am currently attempting exactly this thing ?( in keras! ) - so your pointers would be helpful.

fchollet commented 9 years ago

The recurrent layers in Keras have to process every sample from its first time step to the last. These layers are stateless (memory is cleared after every sample).

So if you've used Keras to generate samples from t=0 to t=n, in order to generate the sample at t=n+1 you will have to re-input samples 0..n. You will do n steps, whereas if you conserved the state of the memory you would only have to do one step.

It doesn't change the overall performance, it's just slower.

We will add stateful RNNs soon to solve this.

Tener commented 9 years ago

@ganarajpr I would like to experiment with this in Keras to, but I can't get my model to work. Would you be so kind to share an example based on your work?

jonilaserson commented 9 years ago

Here is a code sample. This code divide a long character string to chunks of 200 characters, and it learns a model for the next character given the previous ones. At the end it inefficiently generates 128 sentences, each of 200 chars.

import numpy
import sys
sys.path.append('/home/USER/python/keras/')

# Obtain the corpus of character sequence to train from.
# Here it is just the sequence 123456789 repeated 100000 times.
x = "123456789"*100000

# Construct a dictionary, and the reverse dictionary for the participating chars.
# '*" is a 'start-sequence' character.
dct = ['*'] + list(set(x))
max_features = len(dct)
rev_dct = [(j, i) for i, j in enumerate(dct)]
rev_dct = dict(rev_dct)

# Convert the characters to their dct indexes. 
x = [rev_dct[ch] for ch in x]

# Divide the corpuse to substrings of length 200.
n_timestamps = 200
x = x[:len(x)- len(x) % n_timestamps]
x = numpy.array(x, dtype='int32').reshape((-1, n_timestamps))

# Generate input and ouput per substring, as an indicator matrix.
y = numpy.zeros((x.shape[0], x.shape[1], max_features), dtype='int32')
for i in numpy.arange(x.shape[0]):
    for j in numpy.arange(x.shape[1]):
        y[i, j, x[i, j]] = 1        

# Shift-1 the input sequences to the right, and make them start with '*'.
x = numpy.roll(y, 1, axis=1)
x[:, 0, :] = 0
x[:, 0, 0] = 1

# Build the model.
from keras.models import Sequential
from keras.layers.core import TimeDistributedDense, Dropout, Activation
from keras.layers.recurrent import LSTM

model = Sequential()
model.add(LSTM(max_features, 256, return_sequences=True))
model.add(LSTM(256, 256, return_sequences=True))
model.add(LSTM(256, 256, return_sequences=True))
model.add(TimeDistributedDense(256, max_features))
model.add(Activation('time_distributed_softmax'))

model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

model.fit(x, y, batch_size=64, nb_epoch=50)

# Sample 128 sentences (200 characters each) from model.

def mnrnd(probs):
    rnd = numpy.random.random()
    for i in xrange(len(probs)):
        rnd -= probs[i]
        if rnd <= 0:
            return i
    return i

sentences = numpy.zeros((128, n_timestamps+1, max_features))
sentences[:, 0, 0] = 1

# Start sampling char-sequences. At each iteration i the probability over
# the i-th character of each sequences is computed. 
for i in numpy.arange(n_timestamps):
    probs = model.predict_proba(sentences)[:,i,:]
    # Go over each sequence and sample the i-th character.
    for j in numpy.arange(len(sentences)):
        sentences[j, i+1, mnrnd(probs[j, :])] = 1
sentences = [sentence[1:].nonzero()[1] for sentence in sentences]

# Convert to readable text.
text = []
for sentence in sentences:
    text.append(''.join([dct[word] for word in sentence]))
fchollet commented 9 years ago

@jonilaserson thanks for the example!

Tener commented 9 years ago

Thank you for sharing @jonilaserson ! I'll try to contribute an example too if I manage to get something interesting working.

fchollet commented 9 years ago

Due to the simple nature of the data, the example could be greatly simplified (a single LSTM instead of 3 stacked LSTM, 128 hidden dimensions instead of 256, 5 epochs instead of 50).

Here's a generated sequence:

95346779123456789123456789123456789123456789123456789123456789123456789123456789123456789123456889123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123455678123456789123456789123456

Remarkably the network starts being "lost" in the first few characters, then gets back on its feet as soon as it finds a 1.

jonilaserson commented 9 years ago

I fed the Hebrew Bible to the 3-layer LSTM network and generated some text, it was a lot of fun.

On Sun, Jun 14, 2015, 23:57 François Chollet notifications@github.com wrote:

Due to the simple nature of the data, the example could be greatly simplified (a single LSTM instead of 3 stacked LSTM, 128 hidden dimensions instead of 256, 5 epochs instead of 50).

Here's a generated sequence:

95346779123456789123456789123456789123456789123456789123456789123456789123456789123456789123456889123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123455678123456789123456789123456

Remarkably the network starts being "lost" in the first few characters, then gets back on its feet as soon as it finds a 1.

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-111875646.

fchollet commented 9 years ago

@jonilaserson that sounds really cool! Do you want to add the code / data to our example folder? I think a lot of people would potentially be interested in it.

Tener commented 9 years ago

One thing probably worth changing in the code is the use of 'xrange' which doesn't exist in Python 3 as Keras is targeting this version too.

fchollet commented 9 years ago

We now have a character-level text generation LSTM example: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

jonilaserson commented 9 years ago

Aren't you missing a lot of signal by using only the last character of each subsequence as the label? I think you can do a lot more with a lot less by using the TimeDistributedDense.

On Tue, Jun 16, 2015 at 4:06 AM, François Chollet notifications@github.com wrote:

We now have a character-level text generation LSTM example: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-112248303.

jonilaserson commented 9 years ago

@fchollet, I have the bible in a 2.5M text file. Where would be suitable place for it?

On Tue, Jun 16, 2015 at 11:15 AM, Jonathan Laserson jonilaserson@gmail.com wrote:

Aren't you missing a lot of signal by using only the last character of each subsequence as the label? I think you can do a lot more with a lot less by using the TimeDistributedDense.

On Tue, Jun 16, 2015 at 4:06 AM, François Chollet < notifications@github.com> wrote:

We now have a character-level text generation LSTM example: https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-112248303.

fchollet commented 9 years ago

Aren't you missing a lot of signal by using only the last character of each subsequence as the label? I think you can do a lot more with a lot less by using the TimeDistributedDense.

Maybe, what sampling strategy would you use when outputting sequences? If it turns out to converge faster, I'll edit the example to switch to sequence generation instead of character-by-character generation.

I have the bible in a 2.5M text file. Where would be suitable place for it?

Anywhere publicly accessible where the data can stay in the long term. I recommend Amazon S3.

jonilaserson commented 9 years ago

You can check that strategy in the code I published earlier in this thread. If the text is

"There are four lights."

Then that sequence should be the label, and the input sequence should be:

"*There are four lights"

Where '*' marks the beginning of a sentence.

On Tue, Jun 16, 2015 at 8:40 PM, François Chollet notifications@github.com wrote:

Aren't you missing a lot of signal by using only the last character of each subsequence as the label? I think you can do a lot more with a lot less by using the TimeDistributedDense.

Maybe, what sampling strategy would you use when outputting sequences? If it turns out to converge faster, I'll edit the example to switch to sequence generation instead of character-by-character generation.

I have the bible in a 2.5M text file. Where would be suitable place for it?

Anywhere publicly accessible where the data can stay in the long term. I recommend Amazon S3.

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-112509469.

fchollet commented 9 years ago

@jonilaserson I've been running your script with my Nietzsche corpus, but it keeps outputting gibberish. It doesn't seem to make any progress from epoch to epoch. What results have you had so far? Are there any fundamental differences between your corpus and the Nietzsche corpus?

jonilaserson commented 9 years ago

I don't know the Nietzsche corpus. Can you provide a link?

On Wed, Jun 17, 2015 at 4:29 AM, François Chollet notifications@github.com wrote:

@jonilaserson https://github.com/jonilaserson I've been running your script with my Nietzsche corpus, but it keeps outputting gibberish. It doesn't seem to make any progress from epoch to epoch. What results have you had so far? Are there any fundamental differences between your corpus and the Nietzsche corpus?

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-112619507.

fchollet commented 9 years ago

@jonilaserson

from keras.datasets.data_utils import get_file
path = get_file('nietzsche.txt', origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt")
text = open(path).read().lower()
webmaven commented 8 years ago

Does anyone have some samples of the output from training on the Nietzche corpus?

fchollet commented 8 years ago

There were some examples in the mailing list: https://groups.google.com/d/msg/keras-users/Y_FG_YEkjXs/PaKAefgbIrQJ

On 11 November 2015 at 08:12, Michael R. Bernstein <notifications@github.com

wrote:

Does anyone have some samples of the output from training on the Nietzche corpus?

— Reply to this email directly or view it on GitHub https://github.com/fchollet/keras/issues/197#issuecomment-155830143.

webmaven commented 8 years ago

Ah, thanks, @fchollet!

Quoting here for reference (perhaps this should be added as a comment to the example?):

  • "he has given it the sense of unity and self-control as look to the individuals and platoness of men in the soul and the common power, the madied of morals and presurable and belief in the same time and the conscience of their influence, which is the present the conscience of the common end"
  • "the law is a goversion of the common." (I take it to mean, "law is the government of the plebe")
  • "will the same time and beings and art of the strong and self-distrust of the same and all not only a soul and still store of the same time and artist in sacrifice their own soul, and always the most distrous of a man."
  • "we can nation of everywhere, the strength of the foundation, and also us the most dinge in the master and art"

Is there an way to take the output of a trained LSTM, edit it, and feed the changes via some form of backpropagation to further improve the results?

webmaven commented 8 years ago

ping @fchollet

anujgupta82 commented 8 years ago

@bskaggs @fchollet @ganarajpr @Tener @jonilaserson : In his blog karpathy talks of "5 example character models". I was wondering if anyone has implemented all 5 architectures in keras ?

mineshmathew commented 8 years ago

I have made little modifications to the text generation example, to learn in a many-to-many fashion which Karpathy actually does in his implementation

The code is available here https://github.com/mineshmathew/char_rnn_karpathy_keras

webmaven commented 8 years ago

Thanks for the additional example, @mineshmathew, much appreciated. :+1:

rohan589 commented 8 years ago

@mineshmathew @webmaven Can any of you confirm whether @mineshmathew's implementation has the same inefficiency that fchollet pointed out in his post on June 9, 2015?

mineshmathew commented 8 years ago

@rohan589 yes my implementation has the same 'inefficiency' which fchollet was talking about.

webmaven commented 8 years ago

Are there stateful RNN examples yet?

jithurjacob commented 7 years ago

Any updates on this guys?

mineshmathew commented 7 years ago

@jithurjacob there are examples using stateful RNNs . But not sure if there are any for language modelling task

thomasZen commented 7 years ago

@jithurjacob @webmaven here is my take on it: https://github.com/thomasZen/stateful_lstm_keras_text_generation/blob/master/stateful_lstm_text_generation.py It is basically a modification of the text generation example. Using the stateful lstm is about 10 times faster on my CPU and I seem to get similar results. It is not a very sophisticated implementation, any suggestions for improvements are welcome.

DOsinga commented 7 years ago

Something odd is going on with the example at https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py If I run it as is, after 30 epochs I get to loss: 1.2435 - acc: 0.6137 and the generated text with diversity: 0.5 is:

[and frame to "shreaking and a sympathy when the here as the little the impulsed the case of refective of the stricte the exotlecss in the little man of seems to the spirituality and the sanctity in the individual is also the streak--every and all the satisfaction of the old duting one's own heart--what just to a law to]

If I add a layer to the network by inserting: model.add(LSTM(128, input_shape=(maxlen, len(chars)), return_sequences=True)) before the current LSTM, it does a little better and gets to loss: 1.1389 - acc: 0.6449 after 30 epochs. The generated text at diversity 0.5 is:

[of our own profoundest midnight and middle-in science, and in the case of the existence of the rarely in an age are not the pression of free things of man and forgotten of the pression and far from the spectable, and what i have no means the deprese to an actions are the]

If I add another layer though, the network fails to learn at all and just produces gibberish - similarly to what @fchollet said about the initial implementation by @jonilaserson.

Slightly modernizing @mineshmathew solution though works very well with 3 layers:

model = Sequential() model.add(LSTM(128, input_shape=(None, len(chars)), return_sequences=True)) model.add(LSTM(128, return_sequences=True)) model.add(LSTM(128, return_sequences=True)) model.add(Dropout(0.2)) model.add(TimeDistributed(Dense(len(chars)))) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

This gives me loss: 1.0730 - acc: 0.6615 with text generated at 0.5: "aspect of the old mistaken and she is or stonest morality of the historical sense of new christian spirit who has the same delights of the nearness of mankind, and one is possible is a matter of language of acquired to the inmention as a pers"

I would suggest to replace the current example in Keras with @mineshmathew implementation. Happy to send a pull request.

yxtay commented 7 years ago

I have implemented many-to-many character-based RNN using stateful LSTM as is done by Karpathy. You may consider using it as a reference. https://github.com/yxtay/char-rnn-text-generation/blob/master/keras_model.py

silburt commented 6 years ago

@yxtay I implemented a version of your network on generating new song lyrics and it's working pretty well.