spro / practical-pytorch

Go to https://github.com/pytorch/tutorials - this repo is deprecated and no longer maintained
MIT License
4.52k stars 1.11k forks source link

shakespeare.txt is not found #102

Open f2012444 opened 6 years ago

f2012444 commented 6 years ago

shakespeare.txt file which is used in char-run-generation is not present in the data folder.

aspiringguru commented 6 years ago

also came here for this. google shows me a few courses also loading text from shakespeare.txt - all examples link to a copy of shakespeare from project gutenburg, link below.

http://www.gutenberg.org/files/100/100-0.txt

I haven't tried loading this yet. likely will need one of the tools to clean up project gutenberg texts. (ie: remove headers etc)

aspiringguru commented 6 years ago

very rough cleanup. used chapterize for first pass cleanup of http://www.gutenberg.org/files/100/100-0.txt chapterize shakespear_all.txt --nochapters

then manually copied out the first book 'sonnets' then this ugly tool to strip blank lines and lines with ints. https://github.com/aspiringguru/practical-pytorch/blob/master/data/gutenberg_cleanup.py

resulting in this. (needs more cleanup. but eh, will see how the notebook copes. https://github.com/aspiringguru/practical-pytorch/blob/master/data/shakespear_sonnets_out.txt

I'm not proud of it. :)

zehongs commented 6 years ago

This file has the same length. From karpathy's repository. https://github.com/karpathy/char-rnn/blob/master/data/tinyshakespeare/input.txt

spro commented 6 years ago

👍 That's the one. As mentioned in the readme: https://github.com/spro/practical-pytorch/tree/master/char-rnn-generation