trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
427 stars 84 forks source link

role of commas #72

Closed scheddy closed 6 years ago

scheddy commented 6 years ago

Thank you very much for your package and for making your work available to others!

I'm a bit puzzled by the effect of commas in a sentence on the result of the sentiment function:

> sentiment("the the, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.4330127
> sentiment("the, the, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.8660254

Why doubles the sentiment score if there are 2 instead of 1 comma in the sentence?

> sentiment("the not, bad")
   element_id sentence_id word_count sentiment
1:          1           1          3 0.4330127

Why is the sign switched by the negator "not"? I understood from the docu that the comma would limit the context of "bad" such that any valence shifter beyond a comma has no effect.

> sentiment("the, not, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.8660254

Another comma makes the sign as expected.

I'm using the current CRAN version of your package (2.0.1) with the default word lists from the lexicon package (version 0.6.3) on the Windows machine with R version 3.4.2.

trinker commented 6 years ago

This has been fixed via a prior commit in the development version:

if (!require("pacman")) install.packages("pacman")
pacman::p_load_current_gh(file.path('trinker', c('lexicon', 'textclean', 'sentimentr')))

> sentiment("the the, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.4330127
> 
> sentiment("the, the, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.4330127
> 
> sentiment("the not, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.4330127
> 
> sentiment("the, not, bad")
   element_id sentence_id word_count  sentiment
1:          1           1          3 -0.4330127
scheddy commented 6 years ago

great, thanks