two positives net off (?)

ghost commented 5 years ago

Am I missing something fundamental or is it a design?

A reproducible example below indicates netting-off that might need to be flagged to unsuspected users or fixed.

Example: words "x" and "y" are positive. same with phrases "x y" and "y z". a phrase "x y z" ends up neutral, although one would hope it's positive as both words and phrases are positive! Interestingly "z x y" are positive and so is "x z y". The latter is actually the most positive :)

library(sentimentr) mykey <- data.frame( words = c("x", "y", "x y", "y z"), polarity = c(1,1,1,1), stringsAsFactors = FALSE ) mytext<-c("x", "y", "z", "x z", "y z", "x y", "x y z", "z x y", "x z y") sentiment("x", polarity_dt = as_key(mykey)) sentiment("y", polarity_dt = as_key(mykey)) sentiment("z", polarity_dt = as_key(mykey)) sentiment("x y", polarity_dt = as_key(mykey)) sentiment("y z", polarity_dt = as_key(mykey)) sentiment("x y z", polarity_dt = as_key(mykey)) sentiment("z x y", polarity_dt = as_key(mykey)) sentiment("x z y", polarity_dt = as_key(mykey))

sentiment("x", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 1 1 sentiment("y", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 1 1 sentiment("z", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 1 0 sentiment("x y", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 2 0.7071068 sentiment("y z", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 2 0.7071068 sentiment("x y z", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 3 0 sentiment("z x y", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 3 0.5773503 sentiment("x z y", polarity_dt = as_key(mykey)) element_id sentence_id word_count sentiment 1: 1 1 3 1.154701

ghost commented 5 years ago

Interestingly using only words = c("x", "y", "y z") or words = c("x", "y", "x y") does give the expected correct results, it's only when "x", "y" and "x y" and "y z" are in the lexicon then "x y z" is netted of to zero somehow. I'd appreciate any thoughts

trinker commented 5 years ago

Is this the same as: https://github.com/trinker/sentimentr/issues/102? If so why open 2 separate issues for the same issue?

trinker / sentimentr

two positives net off (?) #101