sentiment-analysis-spanish / sentiment-spanish

MIT License
61 stars 13 forks source link

Not actually an Issue.. #1

Closed bpwinter closed 4 years ago

bpwinter commented 4 years ago

Hi there, Just to say i've found this project really interesting. I've worked a lot with NLP and Twitter and used this other package for production: https://github.com/aylliote/senti-py That package actually works pretty good with twitter, a bit to slow when you process thousands of tweets per minute though. Yours is much faster and I see that is in active development right now, so I'm sure it will get even better.

I have some concerns that I would like to share with you about sentiment in Spanish:

1) There are sooo many different types of datasets (twitter, reviews, wikipedia, conversational, etc) and spanish slangs (naming just the diversity in Latinamerica and Spain we actually go nuts), that we can't find a Sentiment Analyzer that's (a least slightly) Universal.

2) I just can't really understand how sentiment drivers work, how can we know to whom or what is this emotion directed. For example, if a have negative tweet about Trump, that doesn't mean that someone is mad at him, it could be that he is happy with Trump but very mad at Joe Biden and just mentioned both guys, or even that he is happy with Trump and just mad about any other random situation. VERY VERY complex problem, nevertheless essential to make sentiment really valuable for analysis.

If you would like to share any insights or strategies about these problems I would be great.

Congratulations for your work and thank you for sharing it!

HugoJBello commented 4 years ago

Hi, thanks for reaching out.

The library that you link (senti-py) is indeed similar to mine, the greatest difference that I find is that senti-py does not use neural networks, but Multinomial Naive Bayes.

Regarding the different datasets, I believe the better ones for sentiment analysis are reviews (products, films, restaurants…) because the language used in them can be extrapolated to other texts easily. I am not sure, but I believe for sentiment analysis slang is not that much of a problem, the core positive and negative words are more or less the same for the different slangs and accents in Spanish. So far I have not had problems with that. I am working on a paper that analyces sentiments in tweets from Spain and Peru using this library and the statistical results are similar.

As you point out, sometimes sentiment analysis is unreliable because it does not catch inner meanings or subtleties as in the example that you mention. I believe though that it works when you measure many tweets, since this effect goes away asinthotically. In fact the example that you mention points towards a known method to capture ideology in tweets: if you measure the sentiment of tweets mentioning trump from a particular user and you find that they are mostly positive, you can have a feeling that that user is pro-trump

See you!

On 3 Jun 2020, at 22:27, bpwinter notifications@github.com wrote:

Hi there, Just to say i've found this project really interesting. I've worked a lot with NLP and Twitter and used this other package for production: https://github.com/aylliote/senti-py https://github.com/aylliote/senti-py That package actually works pretty good with twitter, a bit to slow when you process thousands of tweets per minute though. Yours is much faster and I see that is in active development right now, so I'm sure it will get even better.

I have some concerns that I would like to share with you about sentiment in Spanish:

There are sooo many different types of datasets (twitter, reviews, wikipedia, conversational, etc) and spanish slangs (naming just the diversity in Latinamerica and Spain we actually go nuts), that we can't find a Sentiment Analyzer that's (a least slightly) Universal.

I just can't really understand how sentiment drivers work, how can we know to whom or what is this emotion directed. For example, if a have negative tweet about Trump, that doesn't mean that someone is mad at him, it could be that he is happy with Trump but very mad at Joe Biden and just mentioned both guys, or even that he is happy with Trump and just mad about any other random situation. VERY VERY complex problem, nevertheless essential to make sentiment really valuable for analysis.

If you would like to share any insights or strategies about these problems I would be great.

Congratulations for your work and thank you for sharing it!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sentiment-analysis-spanish/sentiment-spanish/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEOZ4SLWOBYNWB5ZZTHOG3DRU2W2HANCNFSM4NR76XWA.