trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

Bug in the standard key - try sentiment(c("gave time", "time honored", "gave time honored")) #102

Closed ghost closed 5 years ago

ghost commented 5 years ago

The treatment of multi-word lexicons is wrong:

sentiment(c("gave time", "time honored", "gave time honored")) element_id sentence_id word_count sentiment 1: 1 1 2 0.7071068 2: 2 1 2 0.7071068 3: 3 1 3 0.0000000

trinker commented 5 years ago
sentimentr::sentiment(c("gave time", "time honored", "he gave time honored then", 'love good nice'))
   element_id sentence_id word_count sentiment

1:          1           1          2 0.7071068
2:          2           1          2 0.7071068
3:          3           1          5 0.0000000
4:          4           1          3 1.1547005

lexicon::hash_sentiment_jockers_rinker[c('time honored', 'gave time','gave', 'time', 'honored', 
     'love', 'good', 'nice', 'love good', 'love nice')]

               x    y
 1: time honored 1.00
 2:    gave time 1.00
 3:         gave   NA
 4:         time   NA
 5:      honored 1.00
 6:         love 0.75
 7:         good 0.75
 8:         nice 0.50
 9:    love good   NA
10:    love nice   NA

This illustrates the OPs point that it's when there are multi words only there is an issue. Note the NULL example with love, nice, good

ghost commented 5 years ago

Correct. The issue exists when lexicon has multi words (phrases).