nunoachenriques / vader-sentiment-analysis

Java port of Python NLTK sentiment VADER module. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
Apache License 2.0
7 stars 2 forks source link

analysis with "so" before qualifier in a sentence with negation gives wrong results #1

Open Nek opened 7 years ago

Nek commented 7 years ago

Thanks for picking this project up. It's a good fit for an app prototype I'm building. Right now I'm playing with the library from Clojure REPL. Works fine except there are some problems with "so".

moodwiz.core=> (calculate-sentiment "good")
{"negative" 0.0, "neutral" 0.0, "positive" 1.0, "compound" 0.4404}
moodwiz.core=> (calculate-sentiment "so good")
{"negative" 0.0, "neutral" 0.238, "positive" 0.762, "compound" 0.4927}
moodwiz.core=> (calculate-sentiment "I feel good")
{"negative" 0.0, "neutral" 0.256, "positive" 0.744, "compound" 0.4404}
moodwiz.core=> (calculate-sentiment "I feel so good")
{"negative" 0.0, "neutral" 0.385, "positive" 0.615, "compound" 0.4927}
moodwiz.core=> (calculate-sentiment "I don't feel good")
{"negative" 0.546, "neutral" 0.454, "positive" 0.0, "compound" -0.3412}
moodwiz.core=> (calculate-sentiment "I don't feel so good")
{"negative" 0.0, "neutral" 0.445, "positive" 0.555, "compound" 0.5777}
nunoachenriques commented 7 years ago

Hi Nek, this is not an issue in this project code. I've tested[1] it and it's an issue with the original Python implementation from Hutto and the NLTK team. I recommend that you point it out to them so we can all benefit from the improvement.

Anyway, thanks a lot for the report! I'll mark it as an improvement to be scheduled. And yes, I also picked VADER because I'm integrating it with another project :-)

Cheers!

[1]

Python 3.5.3 (default, Jan 19 2017, 14:11:04) 
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.sentiment.vader import SentimentIntensityAnalyzer
>>> sid = SentimentIntensityAnalyzer()
>>> ss = sid.polarity_scores("I don't feel so good")
>>> for k in sorted(ss):
...         print('{0}: {1}\t'.format(k, ss[k]), end='')
... 
compound: 0.5777    neg: 0.0    neu: 0.445  pos: 0.555  >>> 
>>>
apanimesh061 commented 7 years ago

Hi, I looked into this issue. The issue arises from this line which is the implementation of this.

If you use a statement like I don't feel completely good, you can get correct result i.e. {negative=0.466, neutral=0.534, positive=0.0, compound=-0.3865}. If you have so in place of completely, the valence is multiplied by a 1.25, otherwise it is multiplied by -0.74.

I made a few changes here but they will break tests and also some checkstyle rules.

I basically added a rule which checks in a if the trigram has <negative word> <some word | so | this> <so | this>., if that occurs accordingly adjust the score. Previously, we were only handling <never> <some word | so | this> <so | this>. After this you'll get {negative=0.466, neutral=0.534, positive=0.0, compound=-0.3865}.

nunoachenriques commented 7 years ago

Hi Animesh, thanks for taking the time to try a fix/enhancement to this issue.

I took a look at your changes. I believe it will be better to recode it without breaking the tests... ;-) otherwise, we will have to guarantee (by extensive testing) that the implementation is indeed better than the original by Hutto!

Cheers and tell me what do you think...