thunlp / THULAC-Python

An Efficient Lexical Analyzer for Chinese
MIT License
2.02k stars 336 forks source link

请问为什么txt的格式是utf-8还会出现这个问题 #98

Open PhilrainV opened 4 years ago

PhilrainV commented 4 years ago

UnicodeDecodeError: 'gbk' codec can't decode byte 0xa8 in position 0: incomplete multibyte sequence

kathy98443 commented 4 years ago

你是处理file时出现的吗,整体code是什么

fanrongqitiancai commented 2 years ago

我这里也出现这个问题 代码如下: import thulac
import codecs

thu1 = thulac.thulac() thu1.cut_f("input.txt", "output.txt") print('end')