markuskiller / textblob-de

German language support for TextBlob.
https://textblob-de.readthedocs.org
MIT License
104 stars 12 forks source link

Sentiment for text + space + period #5

Closed Hocdoc closed 10 years ago

Hocdoc commented 10 years ago

4 works now, but I still have problems for texts like TextBlobDE(u"A .").sentiment:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/textblob/decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/usr/lib/python2.7/site-packages/textblob_de/blob.py", line 629, in sentiment
    _polarity += s.polarity
  File "/usr/lib/python2.7/site-packages/textblob/decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/usr/lib/python2.7/site-packages/textblob_de/blob.py", line 395, in polarity
    return self.sentiment[0]
  File "/usr/lib/python2.7/site-packages/textblob/decorators.py", line 24, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/usr/lib/python2.7/site-packages/textblob_de/blob.py", line 387, in sentiment
    return self.analyzer.analyze(self.raw)
  File "/usr/lib/python2.7/site-packages/textblob_de/sentiments.py", line 142, in analyze
    text = self._lemmatize(text)
  File "/usr/lib/python2.7/site-packages/textblob_de/sentiments.py", line 147, in _lemmatize
    _lemmas = self.lemmatizer.lemmatize(raw)
  File "/usr/lib/python2.7/site-packages/textblob_de/lemmatizers.py", line 64, in lemmatize
    if w[0].isupper() and i > 0:
   IndexError: string index out of range

Thanks for your bug-fixing work!

markuskiller commented 10 years ago

Latest release on PyPI (0.3.0) reflects the changes: https://pypi.python.org/pypi/textblob-de

Hocdoc commented 10 years ago

Thanks, now sentiment analysis works very stable. I have just tested it with about 10,000 small german text (random Facebook comments) and it works now without problems. :)

markuskiller commented 10 years ago

Thanks for your feedback. The quality of the polarity values is directly linked to the quality of lemmatization. I'm currently working on a textblob implementation of RFTagger and the results of its lemmatizer look promising. I'll let you know as soon as it is ready.