kevincobain2000 / jProcessing

Japanese Natural Langauge Processing Libraries
http://readthedocs.org/docs/jprocessing/en/latest/
BSD 2-Clause "Simplified" License
148 stars 30 forks source link

Unsafe sentence tokenizer in sentiment analysis #8

Closed renoust closed 6 years ago

renoust commented 8 years ago

Hi!

In jSentiments.py, in polarScores_text(), you are processing each sentence by:

for sent in text.split(u'。'):
   etc.

This part actually crashes when you have an empty sentence coming in, that we can protect using:

for sent in text.split(u'。'):
    if len(sent.strip()) == 0:
         continue
    etc.