karpathy / char-rnn

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
11.57k stars 2.58k forks source link

Hints for a Unified Religion Text Generator #126

Closed davidepatti closed 8 years ago

davidepatti commented 8 years ago

Ok, don't ask me why I'm doing this, maybe karpathy is the main suspected for having provided us this beatiful tool :) So, I was trying to train a 700 sized network on merged txt of quran + bible + buddhism sutra, in order generate some mystic text where different actors could interact and (why not ?) give us some otherwise unpredictable wisdom... Let's imagine Jesus talking about Mohamed, Moses in a mosque where buddha is doing something ...I hope you got the idea. Max respect for people beliefs, it's just a way to experiment, and if a God created this world he/she/it also created RNN :)

My problem is that, using sequentially merged txt files, the generation process seems to be too dependent on the current local position. For example, when it starts on ancient testament, it stays there, never jumps on very distant regions of the text used for learning. Things don't mix when char distance is too far. I know that this is an expected behaviour, since char spatial distance implies they are not presented in very different time ranges. Does someone have any suggestion on parameters or code modification that could change this behaviour ? Thank you

moscow25 commented 8 years ago

This is what happens, when you leave out the Talmud...

But seriously, you would expect that given three books of very different style and vocabulary, once it's starts on one, it would not move to the other. The whole point is that it would generate text B which might possibly follow text A in the source material. This is somewhat of an extreme example, kind of like the GNU/MIT license in "generated code" example.

Maybe you could use this, to investigate how many distinct authors might have written the various books of the bible...

Best, N

On Tue, Oct 27, 2015 at 4:36 PM, Davide Patti notifications@github.com wrote:

Ok, don't ask me why I'm doing this, maybe karpathy is the main suspected for having provided us this beatiful tool :) So, I was trying to train a 700 sized network on merged txt of quran + bible + buddhism sutra, in order generate some mystic text where different actors could interact and (why not ?) give us some otherwise unpredictable wisdom... Let's imagine Jesus talking about Mohamed, Moses in a mosque where buddha is doing something ...I hope you got the idea. Max respect for people beliefs, it's just a way to experiment, and if a God created this world he/she/it also created RNN :)

My problem is that, using sequentially merged txt files, the generation process seems to be too dependent on the current local position. For example, when it starts on ancient testament, it stays there, never jumps on very distant regions of the text used for learning. Things don't mix when char distance is too far. I know that this is an expected behaviour, since char spatial distance implies they are not presented in very different time ranges. Does someone have any suggestion on parameters or code modification that could change this behaviour ? Thank you

— Reply to this email directly or view it on GitHub https://github.com/karpathy/char-rnn/issues/126.

davidepatti commented 8 years ago

...sorry for forgetting Talmud in my example :) surely supported ... I agree with you that concatenating books with different styles/topics is by itself "a style", so network thinks that those different content should be distant. I think, at this point, that the best thing to do is some kind of pre-processing that simulates the idea of having all the books in parallel. I'll mix the paragraphs and let the network do the hard work of emerging content from this cohompresive and simultaneous vision :)

karpathy commented 8 years ago

seconding @moscow25 , you have to intertwine the passages. Good luck let us know how it goes! :)