zjy-ucas / ChineseNER

A neural network model for Chinese named entity recognition
1.78k stars 569 forks source link

'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte #21

Open janesunflower opened 6 years ago

janesunflower commented 6 years ago

Traceback (most recent call last): File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev\pydevd.py", line 2358, in globals = debugger.run(setup['file'], None, None, is_module) File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev\pydevd.py", line 1778, in run pydev_imports.execfile(file, globals, locals) # execute the script File "E:\python2.7\pycharm\PyCharm 4.5.5\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "ChineseNER-master/main.py", line 225, in tf.app.run(main) File "tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "ChineseNER-master/main.py", line 219, in main train() File "ChineseNER-master/main.py", line 185, in train best = evaluate(sess, model, "dev", dev_manager, id_to_tag, logger) File "ChineseNER-master/main.py", line 85, in evaluate eval_lines = test_ner(ner_results, FLAGS.result_path) File "ChineseNER-master\utils.py", line 66, in test_ner eval_lines = return_report(output_file) File "ChineseNER-master\conlleval.py", line 282, in return_report counts = evaluate(f) File "ChineseNER-master\conlleval.py", line 74, in evaluate for line in iterable: File "tensorflow\lib\codecs.py", line 713, in next return next(self.reader) File "tensorflow\lib\codecs.py", line 644, in next line = self.readline() File "tensorflow\lib\codecs.py", line 557, in readline data = self.read(readsize, firstline=True) File "tensorflow\lib\codecs.py", line 501, in read newchars, decodedbytes = self.decode(data, self.errors) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

我的是tensorflow 1.3版本,请问下大家有没有遇到类似问题?有何解决方法。

lengxia commented 6 years ago

请问你解决了吗?

janesunflower commented 6 years ago

还没有,悲剧。

yyHaker commented 6 years ago

What's wrong with it?My tensorflow is 1.4.

yyHaker commented 6 years ago

I found this question can be solved as below:

in utils.py change as follows:

def test_ner(results, path): """ Run perl script to evaluate model """ output_file = os.path.join(path, "ner_predict.utf8") with open(output_file, "w", encoding='utf8') as f: to_write = [] for block in results: for line in block: to_write.append(line + "\n") to_write.append("\n") f.writelines(to_write) eval_lines = return_report(output_file) return eval_lines

The reason is that only when you write the file use "utf8" can you open the file use "utf8", and it have nothing to do with the tensorflow version.

lengxia commented 6 years ago

@yyHaker ,good job, it help me solved this problem,thanks

ylwctyt commented 6 years ago

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 0: invalid start byte

2018-02-02 12:08:27,160 - log\train.log - INFO - iteration:1 step:1000/1044, NER loss: 5.380470 2018-02-02 12:08:36,132 - log\train.log - INFO - evaluate:dev Traceback (most recent call last): 运行到这还是那个编码问题,你们遇到了吗?

yyHaker commented 6 years ago

This is still the encoding problem, you can debug to find the encoding problem

SanSLee commented 6 years ago

@yyHaker Thanks!

LiXuanming commented 6 years ago

This is a encoding problem. If you coding in Linux ,please trans the coding by Notepad++.But ,if you coding in Windows ,Please use this : import codecs with codecs.open(filename, 'r', 'utf-8') as f:

this is your process

ghost commented 5 years ago

it is very easy. You just need to change the 'utf-8' to 'gbk' in the 'return_report' of 'utils.py'.