Closed Esaada closed 5 years ago
Hi, I met this problem, too. The reason is the default encoding of your system is not Unicode. It can be solved by adding an encoding argument to the file opening method open()
.
In preprocess.py
line 11,
with open(inst_file, encoding="utf-8") as f:
In translate.py
line 59,
with open(opt.output, 'w', encoding='utf-8') as f:
Best.
It worked! Thanks!
Hi, I passed over all the stpes, no errors or warning. When I got to this command: "python3 preprocess.py -train_src data/multi30k/train.en.atok -train_tgt data/multi30k/train.de.atok -valid_src data/multi30k/val.en.atok -valid_tgt data/multi30k/val.de.atok -save_data data/multi30k.atok.low.pt"
I've got this error: [Info] Get 29000 instances from data/multi30k/train.en.atok Traceback (most recent call last): File "preprocess.py", line 164, in
main()
File "preprocess.py", line 90, in main
opt.train_tgt, opt.max_word_seq_len, opt.keep_case)
File "preprocess.py", line 12, in read_instances_from_file
for sent in f:
File "/venv/lib/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 14: ordinal not in range(128)
I have all the requirements, python, pytorch etc. What am I missing?