Open f2012444 opened 6 years ago
also came here for this. google shows me a few courses also loading text from shakespeare.txt - all examples link to a copy of shakespeare from project gutenburg, link below.
http://www.gutenberg.org/files/100/100-0.txt
I haven't tried loading this yet. likely will need one of the tools to clean up project gutenberg texts. (ie: remove headers etc)
very rough cleanup. used chapterize for first pass cleanup of http://www.gutenberg.org/files/100/100-0.txt chapterize shakespear_all.txt --nochapters
then manually copied out the first book 'sonnets' then this ugly tool to strip blank lines and lines with ints. https://github.com/aspiringguru/practical-pytorch/blob/master/data/gutenberg_cleanup.py
resulting in this. (needs more cleanup. but eh, will see how the notebook copes. https://github.com/aspiringguru/practical-pytorch/blob/master/data/shakespear_sonnets_out.txt
I'm not proud of it. :)
This file has the same length. From karpathy's repository. https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt
👍 That's the one. As mentioned in the readme: https://github.com/spro/practical-pytorch/tree/master/char-rnn-generation
shakespeare.txt file which is used in char-run-generation is not present in the data folder.