Adding appropriate weights to words

trinker / qdap

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

175 stars 44 forks source link

I want to cluster the data on a scale of 10. For this, I am trying to do different things.

frame<- sentiment_frame(c(positive.words,pos.words), c(negative.words,neg.words), pos.weights=.5,neg.weights=-0.5)

I want these words not to affect with much higher intensity on the score and hence the weights. For higher intensity words,

polarity(strip(set),polarity.frame=frame,amplifiers=c(amplification.words,imp_posneg),deamplifiers=c(deamplification.words))

Here, imp_posneg contribute to a higher intensity.

As, I have a huge set, I am not able to make out from the scores as they don't vary much. Is it a proper way to improve the results?

This should give you some thing between 0-10:

x <- polarity(DATA.SPLIT$state, constrain=TRUE)
y <- counts(x)
((1 - (1/(1 + exp(y[["polarity"]])))) * 2) *5

As, I have a huge set, I am not able to make out from the scores as they don't vary much. Is it a proper way to improve the results?

I'm not entirely sure what you want. The algorithm is what it is. You can adjust the weights of the negative/positive words and the weight of the amplifiers. Much of the sentiment of language is conveyed through prosody not word choice. The word choice tends toward 0 polarity.

You can try Matthew Jocker's excellent syuzhet package http://cran.r-project.org/web/packages/syuzhet/index.html which allows for several different algorithms.

trinker / qdap

Adding appropriate weights to words #210