trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

polarity_dt requires words with spaces to work #117

Closed mrwunderbar666 closed 2 years ago

mrwunderbar666 commented 4 years ago

While I was testing some custom dictionaries, I noticed an unexpected behaviour in sentiment and sentiment_by:

If the polarity_dt provided does not contain a word with spaces and a custom valence_shifters_dt is used, then the function is broken.

I traced the bug to this section of the code:

https://github.com/trinker/sentimentr/blob/645401e85e8623540b6174fa76234a1012e2e143/R/sentiment.R#L369

You can replicate it running this short script:

library(sentimentr)

mytext <- c(
  'do you like it?  But I hate really bad dogs',
  'I am the best friend.',
  'Do you really like it?  I\'m not a fan',
  'I am good',
  'I am not good',
  'You are bad',
  'You are not bad'
)

neg_words <- c("bad", "hate")
neg_words <- data.frame(x=neg_words, y=-1, stringsAsFactors = FALSE)
pos_words <- c("good", "like", "friend", "fan")
pos_words <- data.frame(x=pos_words, y=1, stringsAsFactors = FALSE)
my_dict <- rbind(pos_words, neg_words) 

# custom valence shifter
negs_words <- c('not')
negators <- data.frame(x=negs_words, y="1", stringsAsFactors = FALSE)
amplifier_words <- c('really')
amps <- data.frame(x=amplifier_words, y="2", stringsAsFactors = FALSE)

my_valenceshifters <- rbind(negators, amps)
my_valenceshifters <- as_key(my_valenceshifters, sentiment = FALSE, comparison = NULL)

my_sentiment <- as_key(my_dict, comparison = NULL)
is_key(my_sentiment)
mytext <- get_sentences(mytext)

# result will be 0 for all
sentiment_by(mytext, polarity_dt = my_sentiment, valence_shifters_dt = my_valenceshifters)

dummy_spaced <- data.frame(x = c('space word'), y = 0, stringsAsFactors = FALSE)
my_sentiment <- update_key(my_sentiment, x = dummy_spaced)

# results are as expected
sentiment_by(mytext, polarity_dt = my_sentiment, valence_shifters_dt = my_valenceshifters)
trinker commented 4 years ago

Thanks for the issue! We're looking into this behavior.