Open thisandagain opened 7 years ago
Related to #24
On this subject, this link might help https://finnaarupnielsen.wordpress.com/2011/03/16/afinn-a-new-word-list-for-sentiment-analysis/
Edit: Apologies - I misunderstood the issue. I see you already use that and this issue is purely for validation.
I think this is what you are looking for: https://old.datahub.io/dataset/twitter-sentiment-analysis It also may be interesting examining how effective this works against longer texts, one example is the Cornell Movie Review Dataset
We currently validate against a dataset from UCI that includes Amazon, Yelp, and IMDB. This is great but it would be nice to have less formal texts (particularly those that include emoji) included in validation. Various NLP areas are well explored using Twitter as a corpus so I don't think this should be too difficult to track down, but will require some research.