Closed RookieCXL closed 5 years ago
Hi @RookieCXL ,
it seems that the new file you're trying to translate is on a codification different than utf-8
. I suggest you to convert your file to utf-8
for avoiding these encoding issues.
Feel free to reopen this issue if after converting your file the error persists.
I used another dataset for training,for Chinese to English. After model trained,when I run sample_ensemble.py to translate a chinese text, something wrong happen.
2019-05-17 09:35:00.359418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1409 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:02:00.0, compute capability: 5.0) [17/05/2019 09:35:00] <<< Loading optimized model... >>> [17/05/2019 09:35:03] <<< Optimized model loaded. >>> [17/05/2019 09:35:03] <<< Model loaded in 9.3835 seconds. >>> [17/05/2019 09:35:03] <<< Loading Dataset instance from datasets\Dataset_ZhEnTrans_zhen.pkl ... >>> [17/05/2019 09:35:03] <<< Dataset instance loaded >>> [17/05/2019 09:35:03] Removed "val" set output with id "target_text. Traceback (most recent call last): File "sample_ensemble.py", line 62, in
sample_ensemble(args, params)
File "C:\Users\think\Desktop\nmt-keras-master\nmt_keras\apply_model.py", line 41, in sample_ensemble
dataset = update_dataset_from_file(dataset, args.text, params, splits=args.splits, remove_outputs=True)
File "C:\Users\think\Desktop\nmt-keras-master\data_engine\prepare_data.py", line 79, in update_dataset_from_file
overwrite_split=True)
File "c:\users\think\src\keras-wrapper\keras_wrapper\dataset.py", line 1042, in setInput
bpe_codes=bpe_codes, separator=separator, use_unk_class=use_unk_class)
File "c:\users\think\src\keras-wrapper\keraswrapper\dataset.py", line 1693, in preprocessTextFeatures
for line in list:
File "C:\Users\think\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 711, in next
return next(self.reader)
File "C:\Users\think\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 642, in next
line = self.readline()
File "C:\Users\think\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 555, in readline
data = self.read(readsize, firstline=True)
File "C:\Users\think\AppData\Local\Programs\Python\Python36\lib\codecs.py", line 501, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte