sherjilozair / char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
MIT License
2.64k stars 960 forks source link

Support Unicode Input Files #38

Open john-parton opened 8 years ago

john-parton commented 8 years ago

Python opens files as bytes by default. It would be nice if TextLoader.preprocess read the file and encoded it as unicode.

I'll submit a PR.

Thanks.

john-parton commented 8 years ago

I got unicode input files working, but sample.py wasn't unicode aware. I'll submit a new PR with a proposed fix.