Closed amacanovic closed 2 years ago
If you check out ?lexicon::hash_sentiment_huliu
you'll see that this data set is:
an augmented version of Hu & Liu's (2004) positive/negative word list as sentiment lookup values
These were additions that I came across as I worked with the data set that I felt needed to be included.
Hello Tyler, all,
First of all, thank you on amazing work on this package!
I have a question regarding the Hu&Liu (2004) dictionary loaded into the package. Doing a simple frequency check returned the following distribution of words:
Total Observations in Table: 6874
| -2 | -1.05 | -1 | 0 | 1 |
| 7 | 6 | 4824 | 13 | 2024 |
Thus it appears that 7 words have polarity of -2, 6 words polarity of -1.05 and 13 polarity of 0. E.g. "i wish" or "unduly" both carry a -2 weight and "is like" and "i'm like" carry a 0.
Consulting the dictionary avaialable at: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html, none of these seem to appear there. In fact, I did not see any two-word items in the dictionary there.
While these do make sense, I wanted to ask about why they are included and if these weights are indeed accurate and as intended? How are these bigrams accounted for in the calculation? Or am I doing something wrong?
Thanks!