Closed mrwunderbar666 closed 3 years ago
While I was testing some custom dictionaries, I noticed an unexpected behaviour in sentiment and sentiment_by:
sentiment
sentiment_by
If the polarity_dt provided does not contain a word with spaces and a custom valence_shifters_dt is used, then the function is broken.
polarity_dt
valence_shifters_dt
I traced the bug to this section of the code:
https://github.com/trinker/sentimentr/blob/645401e85e8623540b6174fa76234a1012e2e143/R/sentiment.R#L369
You can replicate it running this short script:
library(sentimentr) mytext <- c( 'do you like it? But I hate really bad dogs', 'I am the best friend.', 'Do you really like it? I\'m not a fan', 'I am good', 'I am not good', 'You are bad', 'You are not bad' ) neg_words <- c("bad", "hate") neg_words <- data.frame(x=neg_words, y=-1, stringsAsFactors = FALSE) pos_words <- c("good", "like", "friend", "fan") pos_words <- data.frame(x=pos_words, y=1, stringsAsFactors = FALSE) my_dict <- rbind(pos_words, neg_words) # custom valence shifter negs_words <- c('not') negators <- data.frame(x=negs_words, y="1", stringsAsFactors = FALSE) amplifier_words <- c('really') amps <- data.frame(x=amplifier_words, y="2", stringsAsFactors = FALSE) my_valenceshifters <- rbind(negators, amps) my_valenceshifters <- as_key(my_valenceshifters, sentiment = FALSE, comparison = NULL) my_sentiment <- as_key(my_dict, comparison = NULL) is_key(my_sentiment) mytext <- get_sentences(mytext) # result will be 0 for all sentiment_by(mytext, polarity_dt = my_sentiment, valence_shifters_dt = my_valenceshifters) dummy_spaced <- data.frame(x = c('space word'), y = 0, stringsAsFactors = FALSE) my_sentiment <- update_key(my_sentiment, x = dummy_spaced) # results are as expected sentiment_by(mytext, polarity_dt = my_sentiment, valence_shifters_dt = my_valenceshifters)
Thanks for the issue! We're looking into this behavior.
While I was testing some custom dictionaries, I noticed an unexpected behaviour in
sentiment
andsentiment_by
:If the
polarity_dt
provided does not contain a word with spaces and a customvalence_shifters_dt
is used, then the function is broken.I traced the bug to this section of the code:
https://github.com/trinker/sentimentr/blob/645401e85e8623540b6174fa76234a1012e2e143/R/sentiment.R#L369
You can replicate it running this short script: