kevincobain2000 / jProcessing

Japanese Natural Langauge Processing Libraries
http://readthedocs.org/docs/jprocessing/en/latest/
BSD 2-Clause "Simplified" License
148 stars 30 forks source link

UnicodeDecodeError with classifier.baseline() #14

Open jcneshi opened 6 years ago

jcneshi commented 6 years ago

This is a similar but different issue an another posted here.

$ python jnlp-test-sentencePolarityScore.py
Traceback (most recent call last):
  File "jnlp-test-sentencePolarityScore.py", line 9, in <module>
    print classifier.baseline(text)
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jSentiments.py", line 56, in baseline
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jSentiments.py", line 49, in polarScores_text
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jTokenize.py", line 30, in jTokenize
  File "build/bdist.macosx-10.13-intel/egg/jNlp/jCabocha.py", line 27, in cabocha
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb5 in position 105: invalid start byte

At first, I also had this same error with classifier.train(), but once I ran - ./configure --with-charset=utf8 for the mecab dictionary and for cabocha, the error disappeared.

However, with classifier.baseline() the error remains. Is there another part of the toolchain that I need to configure for utf-8? Am I missing something really basic?

Thanks!

jcneshi commented 6 years ago

By the way, my jnlp-test-sentencePolarityScore.py file uses your code in section 1.4.2, seen here: http://jprocessing.readthedocs.io/en/latest/#how-to-use

kevincobain2000 commented 6 years ago

Hi, is this issue fixed?