Closed evanhasnoclue closed 6 years ago
You probably have a different default encoding in your terminal. Try putting a u
in front of your string:
s = u'NLPIR分词系统前身为2000年发布的ICTCLAS词法分析系统,从2009年开始,为了和以前工作进行大的区隔,并推广NLPIR自然语 言处理与信息检索共享平台,调整命名为NLPIR分词系统。'
@tsroten Thanks. It works! But my result is all made of codes... How can I do to make it into Chinese characters? I've tried s.encode('utf8') 'gbk' so much, but didn't work. Thanks for helping me out!
pynlpir.segment(s) [(u'NLPIR', u'noun'), (u'\u5206\u8bcd', u'verb'), (u'\u7cfb\u7edf', u'noun'), (u'\u524d\u8eab', u'noun'), (u'\u4e3a', u'preposition'), (u'2000\u5e74', u'time word'), (u'\u53d1\u5e03', u'verb'), (u'\u7684', u'particle'), (u'ICTCLAS', u'noun'), (u'\u8bcd\u6cd5', u'noun'), (u'\u5206\u6790', u'verb'), (u'\u7cfb\u7edf', u'noun'), (u'\uff0c', u'punctuation mark'), (u'\u4ece', u'preposition'), (u'2009\u5e74', u'time word'), (u'\u5f00\u59cb', u'verb'), (u'\uff0c', u'punctuation mark'), (u'\u4e3a\u4e86', u'preposition'), (u'\u548c', u'conjunction'), (u'\u4ee5\u524d', u'noun of locality'), (u'\u5de5\u4f5c', u'verb'), (u'\u8fdb\u884c', u'verb'), (u'\u5927', u'adjective'), (u'\u7684', u'particle'), (u'\u533a', u'noun'), (u'\u9694', u'verb'), (u'\uff0c', u'punctuation mark'), (u'\u5e76', u'conjunction'), (u'\u63a8\u5e7f', u'verb'), (u'NLPIR', u'noun'), (u'\u81ea\u7136', u'noun'), (u'\u8bed', u'noun'), (u' ', None), (u'\u8a00', u'verb'), (u'\u5904\u7406', u'verb'), (u'\u4e0e', u'preposition'), (u'\u4fe1\u606f', u'noun'), (u'\u68c0\u7d22', u'verb'), (u'\u5171\u4eab', u'verb'), (u'\u5e73\u53f0', u'noun'), (u'\uff0c', u'punctuation mark'), (u'\u8c03\u6574', u'verb'), (u'\u547d\u540d', u'verb'), (u'\u4e3a', u'verb'), (u'NLPIR', u'noun'), (u'\u5206\u8bcd', u'verb'), (u'\u7cfb\u7edf', u'noun'), (u'\u3002', u'punctuation mark')]
The strings in the response are Unicode, so you'll want to print
them if you want to read them.
segments = pynlpir.segment(s)
for word, part_of_speech in segments:
print(word)
Here are what I put and get.... I tried all three encoding types but they all didn't work... I was so confused about the encoding thing...... Please help me out!