Open saddy001 opened 7 years ago
Interesting, thanks for the report. I'll try to look into it in the next couple weeks. Please feel free to send a PR to fix up the compression and indexing issues if you like.
curl http://www.gutenberg.org/cache/epub/2701/pg2701.txt > corpus.txt
should be
curl http://www.gutenberg.org/cache/epub/2701/pg2701.txt |gunzip -c > corpus.txt
in the docs. The correct index can be seen in my comment above.
Hi,
I think there's something wrong with the quick start example. I see rising accuracy but no real words:
In the example, meaningful words emerge at ~50% ACC. I had to make 2 small changes to the example: First, the corpus is compressed now at gutenberg, so I had to decompress it. Second, I had to change
seed = txt.encode(txt.text[300017:300050])
toseed = txt.encode(txt.text[300015:300048])
To get the same sentence seed.