trinker / qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
http://cran.us.r-project.org/web/packages/qdap/index.html
175 stars 44 forks source link

Sentiment analysis with qdap #208

Closed stripathi08 closed 9 years ago

stripathi08 commented 9 years ago

I am trying to do a sentiment analysis using the package and I want quite a few words to be adjudged as positive and negative words while calculating polarity. When I use sentiment_frame, it only uses the frame and not qdap dictionaries. Is their a way i can use both my words with the set inbuilt package's words?
Here's an example -

An example -

corpus<- "He is a liar." frame<- sentiment_frame("liar","hater",pos.weights=1, neg.weights=-1) z<- counts(polarity(corpus, polarity.frame=frame)) z all wc polarity pos.words neg.words text.var 1 all 4 0.5 liar - He is a liar. corpus<- "He is a good liar." frame<- sentiment_frame("liar","hater",pos.weights=1, neg.weights=-1) z<- counts(polarity(corpus, polarity.frame=frame)) z all wc polarity pos.words neg.words text.var 1 all 5 0.447 liar - He is a good liar.

Why is the polarity decreasing?

trinker commented 9 years ago

Why is the polarity decreasing?

Polarity is relative to the number of words in the sentence. You've added words so the denominator changes from √3 to √5. I think spending some time with the documentation will help you understand more about the specifics on how the score is calculated.

This is with reference to your package's function polarity. I am trying to do a sentiment analysis using the package and I want quite a few words to be adjudged as positive and negative words while calculating polarity. When I use sentiment_frame, it only uses the frame and not qdap dictionaries. Is their a way i can use both my words with the set inbuilt package's words?

You can include the original qdapDictionaries positive/negatives in a frame plus your own because sentiment_frame takes a vector for positives & negatives. Just wrap the qdapDictionary terms and your own with c to make the vector you pass to these respective arguments:

mycorpus <- c("Wow that's a raw move.", "His jokes are so corny")
counts(polarity(mycorpus))

POLKEY <- sentiment_frame(c(positive.words, "raw"), c(negative.words, "corny"))
counts(polarity(mycorpus, polarity.frame=POLKEY))

Which yields:

## > mycorpus <- c("Wow that's a raw move.", "His jokes are so corny")
## > counts(polarity(mycorpus))
##   all wc polarity pos.words neg.words               text.var
## 1 all  5    0.447       wow         - Wow that's a raw move.
## 2 all  5    0.000         -         - His jokes are so corny

## > POLKEY <- sentiment_frame(c(positive.words, "raw"), c(negative.words, "corny"))
## > counts(polarity(mycorpus, polarity.frame=POLKEY))
##   all wc polarity pos.words neg.words               text.var
## 1 all  5    0.894  wow, raw         - Wow that's a raw move.
## 2 all  5   -0.447         -     corny His jokes are so corny
stripathi08 commented 9 years ago

Thank you for answering this sir. I will surely go through the documentation once again. Maybe I had missed some keynotes.

trinker commented 9 years ago

I think the relevant line from the documentation is:

Last, these context cluster (x_i^T) are summed and divided by the square root of the word count (√n)

stripathi08 commented 9 years ago

Got it. Thanks again !