Add Twitter validation dataset

thisandagain / sentiment

AFINN-based sentiment analysis for Node.js.

MIT License

2.64k stars 309 forks source link

Add Twitter validation dataset #110

Open thisandagain opened 7 years ago

thisandagain commented 7 years ago

We currently validate against a dataset from UCI that includes Amazon, Yelp, and IMDB. This is great but it would be nice to have less formal texts (particularly those that include emoji) included in validation. Various NLP areas are well explored using Twitter as a corpus so I don't think this should be too difficult to track down, but will require some research.

thisandagain commented 7 years ago

Related to #24

dparlevliet commented 6 years ago

On this subject, this link might help https://finnaarupnielsen.wordpress.com/2011/03/16/afinn-a-new-word-list-for-sentiment-analysis/

Edit: Apologies - I misunderstood the issue. I see you already use that and this issue is purely for validation.

pdw207 commented 6 years ago

I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis It also may be interesting examining how effective this works against longer texts, one example is the Cornell Movie Review Dataset