trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

Polarity categorization? #128

Closed sadettindemirel closed 2 years ago

sadettindemirel commented 2 years ago

Some sentiment dictionaries offer categorisation for a given text or calculated scores. Is there any categorisation of scores or text in sentimentR. Can we say like 0 is neutral, below 0 negative and above 0 positive? This may not be three class categorisation, it can consist of only negative or positive categories too.

Another thing is not an issue but just a curiosity. Have you ever used VADER for sentiment analysis? It is also developed for valence aware sentiment analysis. It has also ported for R I saw your performace review of other sentiment libraries, if you used it, what do you think about VADER?

trinker commented 2 years ago

Hello. Thanks for the comments.

Some sentiment dictionaries offer categorisation for a given text or calculated scores. Is there any categorisation of scores or text in sentimentR. Can we say like 0 is neutral, below 0 negative and above 0 positive? This may not be three class categorisation, it can consist of only negative or positive categories too.

I added a section to ?sentiment describing:

sentiment - Sentiment/polarity score (note: sentiments less than zero is negative, 0 is neutral, and greater than zero positive polarity)

Turning numeric into catgories is pretty standard so I have not provided a function to do this explicitly. Instead, I added this code chunk to the examles in sentimentr describing how to make categories (you could use sign and or cut to do this as well):

## Categorize the polarity (tidyverse vs. data.table):
library(dplyr)
sentiment(mytext) %>%
as_tibble() %>%
    mutate(category = case_when(
        sentiment < 0 ~ 'Negative', 
        sentiment == 0 ~ 'Neutral', 
        sentiment > 0 ~ 'Positive'
    ) %>%
    factor(levels = c('Negative', 'Neutral', 'Positive'))
)

##   element_id sentence_id word_count sentiment category
##        <int>       <int>      <int>     <dbl> <fct>   
## 1          1           1          4     0.25  Positive
## 2          1           2          6    -1.87  Negative
## 3          2           1          5     0.581 Positive
## 4          3           1          5     0.402 Positive
## 5          3           2          4     0     Neutral 
## 6          4           1          4     0     Neutral

library(data.table)
dt <- sentiment(mytext)[, category := factor(fcase(
        sentiment < 0, 'Negative', 
        sentiment == 0, 'Neutral', 
        sentiment > 0, 'Positive'
    ), levels = c('Negative', 'Neutral', 'Positive'))][]
dt

##    element_id sentence_id word_count  sentiment category
## 1:          1           1          4  0.2500000 Positive
## 2:          1           2          6 -1.8677359 Negative
## 3:          2           1          5  0.5813777 Positive
## 4:          3           1          5  0.4024922 Positive
## 5:          3           2          4  0.0000000  Neutral
## 6:          4           1          4  0.0000000  Neutral

Another thing is not an issue but just a curiosity. Have you ever used VADER for sentiment analysis? It is also developed for valence aware sentiment analysis. It has also ported for R I saw your performace review of other sentiment libraries, if you used it, what do you think about VADER?

No I have not tried it. I am aware of it. At the time ran the tests I don't think it had been ported to R , or at least, I was unaware of it. If the community ran testing on VADAR vs. other approaches it would be interesting to see the results.