trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

Some words in Hu&Liu dict have off polarity values #114

Closed amacanovic closed 2 years ago

amacanovic commented 4 years ago

Hello Tyler, all,

First of all, thank you on amazing work on this package!

I have a question regarding the Hu&Liu (2004) dictionary loaded into the package. Doing a simple frequency check returned the following distribution of words:

Total Observations in Table: 6874

| -2 | -1.05 | -1 | 0 | 1 |

| 7 | 6 | 4824 | 13 | 2024 |

Thus it appears that 7 words have polarity of -2, 6 words polarity of -1.05 and 13 polarity of 0. E.g. "i wish" or "unduly" both carry a -2 weight and "is like" and "i'm like" carry a 0.

Consulting the dictionary avaialable at: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html, none of these seem to appear there. In fact, I did not see any two-word items in the dictionary there.

While these do make sense, I wanted to ask about why they are included and if these weights are indeed accurate and as intended? How are these bigrams accounted for in the calculation? Or am I doing something wrong?

Thanks!

trinker commented 2 years ago

If you check out ?lexicon::hash_sentiment_huliu you'll see that this data set is:

an augmented version of Hu & Liu's (2004) positive/negative word list as sentiment lookup values

These were additions that I came across as I worked with the data set that I felt needed to be included.