ibab / tensorflow-wavenet

A TensorFlow implementation of DeepMind's WaveNet paper
MIT License
5.41k stars 1.29k forks source link

My implementation of WaveNet for text generation (based in this repository) #117

Closed Zeta36 closed 8 years ago

Zeta36 commented 8 years ago

Hi, friends.

As I have not a good GPU for heping you directly in this, I have use the baseline of the work in this repository to develop a WaveNet text generator (self-generator): https://github.com/Zeta36/tensorflow-tex-wavenet.

In summary: I utilize the WaveNet model as a text generator. I feed the model using a raw text data (characters), instead of raw audio files, and once the network is trained, I use the conditional probability found to generate samples (characters) into an self-generating process.

Only printable ASCII characters (Dec. 0 up to 255) is supported right now. Results

And pretty interesting results are reached!! Feeding the network with enough text and training, the model is able to memorize the probability of the characters disposition (in a lenguage), and generate later even a very similar text!!

For example, using the Penn Tree Bank (PTB) dataset, and only after 15000 steps of training (with low set of parameters setting) this was the self-generated output (the final loss was around 1.1):

"Prediction is: 300-servenns on the divide mushin attore and operations losers nis called him for investment it was with as pursicularly federal and sotheby d. reported firsts truckhe of the guarantees as paining at the available ransions i 'm new york for basicane as a facerement of its a set to the u.s. spected on install death about in the little there have a $ N million or N N bilot in closing is of a trading a congress of society or N cents for policy half feeling the does n't people of general and the crafted ended yesterday still also arjas trading an effectors that a can singaes about N bound who that mestituty was below for which unrecontimer 's have day simple d. frisons already earnings on the annual says had minority four-$ N sance for an advised in reclution by from $ N million morris selpiculations the not year break government these up why thief east down for his hobses weakness as equiped also plan amr. him loss appealle they operation after and the monthly spendings soa $ N million from cansident third-quarter loan was N pressure of new and the intended up he header because in luly of tept. N million crowd up lowers were to passed N while provision according to and canada said the 1980s defense reporters who west scheduled is a volume at broke also and national leader than N years on the sharing N million pro-m was our american piconmentalist profited himses but the measures from N in N N of social only announcistoner corp. say to average u.j. dey said he crew is vice phick-bar creating the drives will shares of customer with welm reporters involved in the continues after power good operationed retain medhay as the end consumer whitecs of the national inc. closed N million advanc"

This is really wonderful!! We can see that the original WaveNet model has a great capacity to learn and save long codified text information inside its nodes (and not only audio or image information). This "text generator" WaveNet was able to learn how to write English words and phrases just by predicting characters one by one, and sometimes was able even to learn what word to use based on context.

This output is far to be perfect, but It was trained in a only CPU machine (without GPU) using a low set of parameters configuration in just two hours!! I hope somebody with a better computer can explore the potential of this implementation.

You can download the new development in here: https://github.com/Zeta36/tensorflow-tex-wavenet.

Technically:

1) I made a TextReader for feeding and replace the AudioReader. 2) I used the printable character ASCII decimal value (0-255) as the 8bit sample (and I remove the mu_law function from everywhere). 3) Removed all TensorBoard summaries (I have no memory to waste :P). 4) Removed wite_wav() and developed a write_text() 5) Some other minor changes: I start the "waveform" always with a space (char 32) and not with a random int, changed some terminal arguments, etc.

And that all!!

I hope this can help you in any way.

Best regards, Samu.

ibab commented 8 years ago

Very cool results, the network seems to like "$", "N" and "million" :D This makes me wonder whether it would deal well with extra conditioning information like text sentiment, etc.

Zeta36 commented 8 years ago

Thank you, @ibab. We could try something about it when the local conditioning is developed.

I wonder if some "network" like this could be in the cortex of our brain. Maybe something similar to this self-generating "network" could be the biologic base of our long-term memory.

jyegerlehner commented 8 years ago

@zeta36 Well done!

It makes up some brilliant non-words.

"selpiculations" I feel like I should know what that means.

"facerement", "cansident" sound like french words :) I think any word I don't know that ends with "-ment" must be such.

Feel free to offer up a PR or two related to this. Abstracting the audio_reader into input_reader, and providing specializations via audio_reader and text_reader would be valuable.

3) Removed all TensorBoard summaries (I have no memory to waste :P).

That would be a welcome option too.

Zeta36 commented 8 years ago

Thank you, @jyegerlehner :).

Training with more text files I get even better results that the one I described above. I wonder what could somebody with a really good GPU reach training over a huge number of txt files and with a high setting in the parameters (I used only the basic configuration of the master branch, setting only bias to true). I think that this text generator could even be able to learn to write large phrases based on context.

You said:

"Feel free to offer up a PR or two related to this. Abstracting the audio_reader into input_reader, and providing specializations via audio_reader and text_reader would be valuable.

3) Removed all TensorBoard summaries (I have no memory to waste :P). That would be a welcome option too."

But I'm not sure if @ibab will be agree with such a big change.

Regards.

ibab commented 8 years ago

@Zeta36: I'd definitely agree to a PR :) We should think of how to handle the mu law encoding in this case. For example, we could move it into the AudioReader. Then the network would be generic enough to work with a TextReader.

Nyrt commented 8 years ago

Oh man! I was actually considering trying to do this myself to get a better grasp of the underlying architecture. Definitely going to check this out.

mjwillson commented 8 years ago

This is neat, although I wonder if it's actually competitive as a language model? It would be good to compare it on perplexity against a fair baseline for the character-based language modelling task, e.g. a character RNN with a similar number of parameters and/or similar runtime cost (perhaps using GRU or LSTM cells).

ddofer commented 8 years ago

Perplexity wise, how does this compare to traditional char/word level LSTM models?

Zeta36 commented 8 years ago

I made a ImageReader using a one grayscale channel (1D-vector) that can simulate (approximately) too the DeepMind PixelCNN paper. Look in here #125

hardmaru commented 8 years ago

A suggestion would be to train the model on the proper train/validation/test set of ptb (https://github.com/yoonkim/lstm-char-cnn/tree/master/data/ptb), and calculate the bits-per-character your model obtained on the test set (using validation set for early stopping).

This will allow you to better compare how well the model scores against other models in recent literature for identical baseline task.

jhave commented 8 years ago

Thanks @Zeta36 ! I've been using WaveNet for text generation on a 11K corpus of poems, and find the results to be far more readable and sophisticated than other LSTM character-based models or GANs.

-- I would be very interested in knowing if there is an article / blog-post on the implications of each of the wavenet_params

Zeta36 commented 8 years ago

Please, @jhave could you paste over here some results of your trained model??

nakosung commented 8 years ago

Nice to see you on github, @hardmaru !

hardmaru commented 8 years ago

Hi there!

Not sure why I'm on this thread - not an expert on wavenets~

On Wednesday, October 12, 2016, Nako Sung notifications@github.com wrote:

Nice to see you on github, @hardmaru https://github.com/hardmaru !

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ibab/tensorflow-wavenet/issues/117#issuecomment-253429479, or mute the thread https://github.com/notifications/unsubscribe-auth/AGBoHlBioETud4JgAg2fdgs0Krv2MsJ5ks5qzdWigaJpZM4KMDmT .

jhave commented 8 years ago

Hi Samuel!

Thanks much for your amazing code. I'd love to post but am github newbie, semi-incompetent: how can i post? i receive following mssg:

Uploads are disabled.

File uploads require push access to this repository.

On Thu, Oct 13, 2016 at 1:40 PM, Samuel Graván notifications@github.com wrote:

Please, @jhave https://github.com/jhave could you paste over here some results of your trained model??

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ibab/tensorflow-wavenet/issues/117#issuecomment-253419220, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFP5kOFb2nykD1XeF-RXuK4pQXWmDk0ks5qzcQ4gaJpZM4KMDmT .

Zeta36 commented 8 years ago

@jhave but, what do you want exactly to post and where? I don't understand you, friend.

Regards.

jhave commented 8 years ago

aah now my turn to be confused, you wrote me: "Please, @jhave https://github.com/jhave could you paste over* here* some results of your trained model??"

On Sun, Oct 16, 2016 at 5:33 PM, Samuel Graván notifications@github.com wrote:

@jhave https://github.com/jhave but, what do you want exactly to post and where? I don't understand you, friend.

Regards.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ibab/tensorflow-wavenet/issues/117#issuecomment-254036811, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFP5uBF_dz1sb8tD_LfAUbZGGPrjfYlks5q0e9OgaJpZM4KMDmT .

mainrs commented 8 years ago

I have a 1070 GPU that I could use to generate some sample text if you guys want to.

Zeta36 commented 8 years ago

it's true @jhave. I didn't remember that. You can upload the file into a fork of this project, and later post a link here to the file.

jhave commented 8 years ago

Hi folks: poetry' results from using this code on a poetry corpus are here http://bdp.glia.ca/wavenet-for-poem-generation-preliminary-results (I should have forked the project, am such a git newbie i merely downloaded)

veqtor commented 6 years ago

I'm experimenting with this repo for text, getting some really interesting results training on 5 books, really starts to produce interesting stuff at above step 40K... Also, having a really big dataset and quite slow learning rate (0.0001) along with WIDE channels (256) seems to be the trick!

I remember when I trained a audio wavenet that I really got the best results after around 200K steps w. 10K samples per step...

Also, I modified it to have gradient-clipping, I think this might also improve audio wavenet training