how to encode Korean text input file?

hunkim / word-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for word-level language models in Python using TensorFlow.

MIT License

1.31k stars 495 forks source link

Open majinman opened 6 years ago

majinman commented 6 years ago

When I use Korean test as the input.txt, error occurs like the following.

UnicodeDecodeError: 'cp949' codec can't decode byte 0xff in position 0: illegal multibyte sequence

how can I make computer learn Korean text?

Plz, help me.

patakk commented 6 years ago

If you look at the train.py code, you'll see there's an --input_encoding argument to which you supply some of these. I think you need cp949.