More text classification data

Currently, text is only being classified by sentiment and language, but there's quite a few other ways to classify text. I propose text should be classified by formality, controversialness (this can be similar to sentiment, but most negative messages wouldn't be considered controversial and not all controversial messages would be considered negative), spamminess (this kind of goes hand-in-hand with formality) and confidence (so a question would be considered unconfident, a response like "I think", "maybe", "probably" or "I don't know" would be slightly more confident and a statement-like response would be much more confident). Of course, this could increase generation time and finding useful models for these classifications could be difficult, but I think they would be very useful so it's worth looking into.

mlomb / chat-analytics

More text classification data #57