Closed amrrs closed 6 years ago
I think you need to fix your encoding. The single quote is a special character on money's. Try:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, textclean, magrittr)
"You don't get your money’s worth" %>%
replace_non_ascii() %>%
sentiment()
## element_id sentence_id word_count sentiment
## 1: 1 1 6 -0.3061862
In this case I use the textclean package to swap out the non-ascii character
Thanks. You're right. But then I got into another issue when the same smart/curly brace is in don't
> "you dont get your moneys worth" %>% sentiment()
element_id sentence_id word_count sentiment
1: 1 1 6 0.3061862
> "You don't get your money’s worth" %>% replace_non_ascii() %>% sentiment()
element_id sentence_id word_count sentiment
1: 1 1 6 -0.3061862
> "You dont get your money’s worth" %>% replace_non_ascii() %>% sentiment()
element_id sentence_id word_count sentiment
1: 1 1 6 0.3061862
Guess the lame way is replacing those characters with relevant ones using regex. But wouldn't it be good if you algo handles dont
as it would do with don't
?
It would not be good to treat dont as don't as there would be folks who don't like that I'd make such an assumption on their behalf. Instead sentimentr takes the philosophy that one can control these parameters oneself. You simply need to add these values to the valence shifter table. Essentially you want to grab the negators that are contractions, remove the contraction, and add them back to the table. This is how you can do this:
if (!require("pacman")) install.packages("pacman")
pacman::p_load(sentimentr, textclean, dplyr)
negator_contractions_sans_apostrophe <- lexicon::hash_valence_shifters %>%
dplyr::filter(y == 1 & grepl("'", x)) %>%
mutate(x = gsub("'", '', x))
val_shift_tab <- update_valence_shifter_table(
lexicon::hash_valence_shifters,
x = negator_contractions_sans_apostrophe
)
c("You don't get your money’s worth", "You dont get your money’s worth") %>%
replace_non_ascii() %>%
sentiment(valence_shifters_dt = val_shift_tab)
## element_id sentence_id word_count sentiment
## 1: 1 1 6 -0.3061862
## 2: 2 1 6 -0.3061862
Hi,
sentimentr
is awesome.Sentence:
"You don’t get your money’s worth"
But i was wondering how this sentence could have a sentiment score of 0.3 (because of money's worth). I'm just looking for your suggestions (if there's any) if I could do something to nullify this positive score of it?
My idea is to create a range of of scores like -0.25 to +0.25 and call them neutral hence potential avoiding false positives like this. But waiting for your thoughts!
PS: It's not a technical issue I suppose, but thought it's no harm raising it here!