kevincobain2000 / jProcessing

Japanese Natural Langauge Processing Libraries
http://readthedocs.org/docs/jprocessing/en/latest/
BSD 2-Clause "Simplified" License
148 stars 30 forks source link

Execution Error for classifier.baseline function #5

Closed NaveenSrikanth closed 6 years ago

NaveenSrikanth commented 8 years ago

Dear Kevincobain,

First I am very much thankful to you for posting the step by step process to classify the Japan Sentiments. I tried to replicate the same as you have done.

I used below code of yours

from jNlp.jSentiments import * jp_wn = '../../../../data/wnjpn-all.tab' en_swn = '../../../../data/SentiWordNet_3.0.0_20100908.txt' classifier = Sentiment() classifier.train(en_swn, jp_wn) text = u'監督、俳優、ストーリー、演出、全部最高!'

Until above statement everything worked fine. But when I tried to use below statement

print classifier.baseline(text)

I got below error Traceback (most recent call last): File "", line 1, in File "build/bdist.linux-i686/egg/jNlp/jSentiments.py", line 55, in baseline File "build/bdist.linux-i686/egg/jNlp/jSentiments.py", line 48, in polarScores_text File "build/bdist.linux-i686/egg/jNlp/jTokenize.py", line 30, in jTokenize File "", line 124, in XML cElementTree.ParseError: not well-formed (invalid token): line 1, column 9

Please help me in clearing the issue. Please tell what am I doing wrong.

But when I classify the word sentiments I am able to do it properly.

Please help me in clearing this issue

kevincobain2000 commented 8 years ago

seems like encoding error, try and throw in encoded utf8 string or possibly need to your terminal settings of you tried the code from python cli

http://stackoverflow.com/questions/13046240/parseerror-not-well-formed-invalid-token-using-celementtree