'ascii' codec can't decode byte 0xc3 , What am I missing?!

jadore801120 / attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

MIT License

8.82k stars 1.98k forks source link

'ascii' codec can't decode byte 0xc3 , What am I missing?! #71

Closed Esaada closed 5 years ago

Esaada commented 5 years ago

Hi, I passed over all the stpes, no errors or warning. When I got to this command: "python3 preprocess.py -train_src data/multi30k/train.en.atok -train_tgt data/multi30k/train.de.atok -valid_src data/multi30k/val.en.atok -valid_tgt data/multi30k/val.de.atok -save_data data/multi30k.atok.low.pt"

I've got this error: [Info] Get 29000 instances from data/multi30k/train.en.atok Traceback (most recent call last): File "preprocess.py", line 164, in main() File "preprocess.py", line 90, in main opt.train_tgt, opt.max_word_seq_len, opt.keep_case) File "preprocess.py", line 12, in read_instances_from_file for sent in f: File "/venv/lib/python3.5/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)

I have all the requirements, python, pytorch etc. What am I missing?

elvisyjlin commented 5 years ago

Hi, I met this problem, too. The reason is the default encoding of your system is not Unicode. It can be solved by adding an encoding argument to the file opening method open().

In preprocess.py line 11,

    with open(inst_file, encoding="utf-8") as f:

In translate.py line 59,

    with open(opt.output, 'w', encoding='utf-8') as f:

Best.

Esaada commented 5 years ago

It worked! Thanks!