trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
427 stars 84 forks source link

Unexpected high score #64

Closed trinker closed 6 years ago

trinker commented 6 years ago

From @Lilly Wang via email

I am using the get_sentiment function with the "exploratory" package to score my text. It is my understanding that the function is using "sentimentr" built by you -- and here is an example of one of my problems:

Text with announcement of events in many different high schools:

"Congresswoman Matsui, as a member of the Cash for College Coalition, is hosting January 27th at Hiram Johnson High School and February 17th at Kennedy High School. The full list of workshops is below:January 20, 2010: Foothill High School, 6:30-8:30 pmJanuary 21, 2010: Sacramento Charter High School, 6:00-8:00 pmJanuary 26, 2010: Valley High School, 6:00-8:00 pmJanuary 27, 2010: Hiram Johnson High School, 6:00-8:00 pmJanuary 28, 2010: Florin High School, 6:00-8:00 pmFebruary 1, 2010: River City High School, 6:00-8:00 pmFebruary 2, 2010: New San Juan High School, 6:30-8:30 pmFebruary 3, 2010: Cordova High School, 6:30-8:30 pmFebruary 4, 2010: West Campus High School, 6:00-8:00 pmFebruary 9, 2010: Grant Union High School, 6:00-8:00 pmFebruary 10, 2010: Rio Linda High School, 6:00-8:00 pmFebruary 11, 2010: Natomas High School, 6:00-8:00 pmFebruary 17, 2010: Kennedy High School, 6:00-8:00 pmFebruary 23, 2010: Encina High School, 6:00-8:00 pmFebruary 25, 2010: Burbank High School, 6:00-8:00 pmFebruary 27, 2010, Woodland Community College, 10:00 am-2:00 pm"

When I ran this text through sentiment, it returns a score over 100. Can you suggest how I should handle as this does not seem correct?

library(exploratory) 
library(dplyr)
library(sentimentr)

mytext<-"Congresswoman Matsui, as a member of the Cash for College Coalition, is hosting January 27th at Hiram Johnson High School and February 17th at Kennedy High School. The full list of workshops is below:January 20, 2010: Foothill High School, 6:30-8:30 pmJanuary 21, 2010: Sacramento Charter High School, 6:00-8:00 pmJanuary 26, 2010: Valley High School, 6:00-8:00 pmJanuary 27, 2010: Hiram Johnson High School, 6:00-8:00 pmJanuary 28, 2010: Florin High School, 6:00-8:00 pmFebruary 1, 2010: River City High School, 6:00-8:00 pmFebruary 2, 2010: New San Juan High School, 6:30-8:30 pmFebruary 3, 2010: Cordova High School, 6:30-8:30 pmFebruary 4, 2010: West Campus High School, 6:00-8:00 pmFebruary 9, 2010: Grant Union High School, 6:00-8:00 pmFebruary 10, 2010: Rio Linda High School, 6:00-8:00 pmFebruary 11, 2010: Natomas High School, 6:00-8:00 pmFebruary 17, 2010: Kennedy High School, 6:00-8:00 pmFebruary 23, 2010: Encina High School, 6:00-8:00 pmFebruary 25, 2010: Burbank High School, 6:00-8:00 pmFebruary 27, 2010, Woodland Community College, 10:00 am-2:00 pm"
myds<-data.frame(mytext)
myds$mytext<-as.character(myds$mytext)
mytest<- exploratory::clean_data_frame(myds) %>% mutate(myscr=get_sentiment(mytext)) %>% select(mytext, myscr)
mytest
    myscr
1 120.798