Confussion matrix: Precission and recall

maytepenella commented 6 years ago

The matrix is divided in four quarters and contains:

True Positives (TP): Positive samples predicted as such. True Negatives (TN): Negative samples predicted as such. False Positives (FP): Negative samples predicted as positive. False Negatives (FN): Positive samples predicted as negative.

Should we consider a 3 by 3 matrix? (possitive, neutral, negative)

maytepenella commented 6 years ago

Implemented in notebook "Machine Learning IIpy36.ipynb" from 11_SupervisedLearning classes

borbota commented 6 years ago

screenshot from 2018-04-01 16-37-21 for code see my notebook

maytepenella commented 6 years ago

We have the following percentages of tweets:

negative tweets: 62.9667577413
neutral tweets: 21.0154826958
positive tweets: 16.0177595628

We are relativelly good at classifying bad tweets, but neutral and also possitive tweets are degrading performance a lot. Maybe we should consider a first division among tweets with sentiment and tweets without sentiment (as described in the reference provided by Laia https://github.com/ayushoriginal/Sentiment-Analysis-Twitter )

Also we should analyze why we fail at possitive tweets.

jsantalo / happybirds

Confussion matrix: Precission and recall #10