trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
426 stars 84 forks source link

Totally different results in Qdap polarity and sentiment r #119

Closed GabriellaS-K closed 4 years ago

GabriellaS-K commented 4 years ago

Hi,

This isn't an issue with the code, but with my understanding.

I have responses to an online survey on medical symptoms from 3000 people. I am trying to see whether the wording of the question affects the type of response (i.e. does one question elicit a ‘less negative' list of symptoms than another/are there any positive symptoms?).

I have 3 different ways of asking the question, so 3 different primes, and about 1000 people in each group. I initially cleaned the text (lower case, changed numbers into letters etc) and used qdap's polarity function, this found different avg polarity in the three groups (-0.5, -0.3, -0.2). I then found the sentimentr package and ran my data through that and got almost identical answers (-0.22,-0.25, -0.21).

I clearly have not understood what I'm doing wrong, could it be that qdap's polarity and sentimentr give such different responses?

Also, a second question-should I clean + lemmatise the responses in both packages? or does it only need to be done in qdap?

Sorry for such a long question!!!

trinker commented 4 years ago

I do not understand. You said

I initially cleaned the text (lower case, changed numbers into letters etc) and used qdap's polarity function, this found different avg polarity in the three groups (-0.5, -0.3, -0.2). I then found the sentimentr package and ran my data through that and got almost identical answers (-0.22,-0.25, -0.21).

This indicates the two algoriithms are similar?

Additionally, I would not expect sentimentr and qdap to be identical. sentimentr is an improvement on both speed and accuracy of qdap. IMO they are not comparable.

Also, a second question-should I clean + lemmatise the responses in both packages?

You might want to use some of the replace_ functions for cleaning. That's a researcher call but sentimentr makes these tools available. As far as lemmatise, this is not a requirement of either package but as a researcher you may have a reason to do so but in my experience I have never lemitized before using sentimentr.

GabriellaS-K commented 4 years ago

Sorry, I don't mean that (-0.5, -0.3, -0.2) and (-0.22,-0.25, -0.21) are the same, but that (-0.22,-0.25, -0.21) are very similar while (-0.5, -0.3, -0.2) are not. The algorithms are not similar, but I'm trying to decide which to use, it sounds like sentimentr is the better package to use?

Thank you so much for your answers!

trinker commented 4 years ago

Ah, gotcha. My advice is to use sentimentr.

On Fri, Sep 4, 2020, 12:59 PM Gabriella notifications@github.com wrote:

Sorry, I don't mean that (-0.5, -0.3, -0.2) and (-0.22,-0.25, -0.21) are the same, but that (-0.22,-0.25, -0.21) are very similar while (-0.5, -0.3, -0.2) are not. The algorithms are not similar, but I'm trying to decide which to use, it sounds like sentimentr is the better package to use?

Thank you so much for your answers!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/trinker/sentimentr/issues/119#issuecomment-687269833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANOPTTWJNU264QXDXAT3XTSEEMHDANCNFSM4QVAGXGQ .

GabriellaS-K commented 4 years ago

thanks!