trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

"You don’t get your money’s worth" - Senti Score: 0.3 - Any suggestions to improve? #57

Closed amrrs closed 6 years ago

amrrs commented 6 years ago

Hi, sentimentris awesome.

Sentence: "You don’t get your money’s worth"

But i was wondering how this sentence could have a sentiment score of 0.3 (because of money's worth). I'm just looking for your suggestions (if there's any) if I could do something to nullify this positive score of it?

My idea is to create a range of of scores like -0.25 to +0.25 and call them neutral hence potential avoiding false positives like this. But waiting for your thoughts!

PS: It's not a technical issue I suppose, but thought it's no harm raising it here!

trinker commented 6 years ago

I think you need to fix your encoding. The single quote is a special character on money's. Try:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, textclean, magrittr)

"You don't get your money’s worth" %>%
    replace_non_ascii() %>%
    sentiment()

##    element_id sentence_id word_count  sentiment
## 1:          1           1          6 -0.3061862

In this case I use the textclean package to swap out the non-ascii character

amrrs commented 6 years ago

Thanks. You're right. But then I got into another issue when the same smart/curly brace is in don't


> "you dont get your moneys worth" %>% sentiment()
   element_id sentence_id word_count sentiment
1:          1           1          6 0.3061862
> "You don't get your money’s worth" %>%  replace_non_ascii() %>% sentiment()
   element_id sentence_id word_count  sentiment
1:          1           1          6 -0.3061862
> "You dont get your money’s worth" %>%  replace_non_ascii() %>% sentiment()
   element_id sentence_id word_count sentiment
1:          1           1          6 0.3061862

Guess the lame way is replacing those characters with relevant ones using regex. But wouldn't it be good if you algo handles dont as it would do with don't?

trinker commented 6 years ago

It would not be good to treat dont as don't as there would be folks who don't like that I'd make such an assumption on their behalf. Instead sentimentr takes the philosophy that one can control these parameters oneself. You simply need to add these values to the valence shifter table. Essentially you want to grab the negators that are contractions, remove the contraction, and add them back to the table. This is how you can do this:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, textclean, dplyr)

negator_contractions_sans_apostrophe <- lexicon::hash_valence_shifters %>%
    dplyr::filter(y == 1 & grepl("'", x)) %>%
    mutate(x = gsub("'", '', x))

val_shift_tab <- update_valence_shifter_table(
    lexicon::hash_valence_shifters, 
    x = negator_contractions_sans_apostrophe
)

c("You don't get your money’s worth", "You dont get your money’s worth") %>%
    replace_non_ascii() %>%
    sentiment(valence_shifters_dt = val_shift_tab)

##    element_id sentence_id word_count  sentiment
## 1:          1           1          6 -0.3061862
## 2:          2           1          6 -0.3061862