Closed ghost closed 5 years ago
Interestingly using only words = c("x", "y", "y z") or words = c("x", "y", "x y") does give the expected correct results, it's only when "x", "y" and "x y" and "y z" are in the lexicon then "x y z" is netted of to zero somehow. I'd appreciate any thoughts
Is this the same as: https://github.com/trinker/sentimentr/issues/102? If so why open 2 separate issues for the same issue?
Am I missing something fundamental or is it a design?
A reproducible example below indicates netting-off that might need to be flagged to unsuspected users or fixed.
Example: words "x" and "y" are positive. same with phrases "x y" and "y z". a phrase "x y z" ends up neutral, although one would hope it's positive as both words and phrases are positive! Interestingly "z x y" are positive and so is "x z y". The latter is actually the most positive :)
library(sentimentr) mykey <- data.frame( words = c("x", "y", "x y", "y z"), polarity = c(1,1,1,1), stringsAsFactors = FALSE ) mytext<-c("x", "y", "z", "x z", "y z", "x y", "x y z", "z x y", "x z y") sentiment("x", polarity_dt = as_key(mykey)) sentiment("y", polarity_dt = as_key(mykey)) sentiment("z", polarity_dt = as_key(mykey)) sentiment("x y", polarity_dt = as_key(mykey)) sentiment("y z", polarity_dt = as_key(mykey)) sentiment("x y z", polarity_dt = as_key(mykey)) sentiment("z x y", polarity_dt = as_key(mykey)) sentiment("x z y", polarity_dt = as_key(mykey))