aronwc commented 9 years ago

Train on three classes (1:positive 0:neutral, -1: negative)
Predict on all organic tweets.
Remove those with predicted label=0
Plot number pos/negative per month
Plot % positive (= pos / (pos + neg)) per month

ElaineResende commented 9 years ago

I have ran the classifier with 3 classes (1:positive 0:neutral, -1: negative).
Our training set contains 676 (33.8%) are positive, 275 (13.7%) are negative, 1049(52.5%) are neutral
Out results after classification in the 900k tweets:
- 8% negative, 58.6% neutral, 33.4% positive, what is also comparable to training set.
- the % of positive is pretty close

This plot shows all organic tweets: 3classes_allorganic

This plot shows one tweet per user: 3classes_onetweetperuser

These plots below are the same as above, but they don't show neutral tweets: noneutral_3classes_allorganic noneutral_3classes_onetweetperuser

The spikes did not change, they remain in the same months as before. Also the shape of percentage of users who tweeted positively below is almost the same, except by the beginning months (October through February). And the percentage of positivism is higher when compared to the first plot we have plotted.

Percentage of positive by month: sentiment

aronwc commented 9 years ago

Great!

For the final graph, can you try plotting

pos / (#pos + #neg)

(that is, exclude neutrals)

On Thu, Jun 25, 2015 at 1:46 PM, ElaineResende notifications@github.com wrote:

I have ran the classifier with 3 classes (1:positive 0:neutral, -1: negative).

Our training set contains 676 (33.8%) are positive, 275 (13.7%) are negative, 1049(52.5%) are neutral

Out results after classification in the 900k tweets:

8% negative, 58.6% neutral, 33.4% positive, what is also comparable to training set.

the % of positive is pretty close

This plot shows all organic tweets: [image: 3classes_allorganic] https://cloud.githubusercontent.com/assets/8547396/8362037/2cc65006-1b3c-11e5-92eb-aeb2bd0a86e8.png

This plot shows one tweet per user: [image: 3classes_onetweetperuser] https://cloud.githubusercontent.com/assets/8547396/8362036/2cc570b4-1b3c-11e5-9f29-5b0608617ff2.png

These plots below are the same as above, but they don't show neutral tweets: [image: noneutral_3classes_allorganic] https://cloud.githubusercontent.com/assets/8547396/8362070/63fa721e-1b3c-11e5-8665-c430ea555a30.png [image: noneutral_3classes_onetweetperuser] https://cloud.githubusercontent.com/assets/8547396/8362071/63fad7a4-1b3c-11e5-9f12-7e2fd0c2875e.png

The spikes did not change, they remain in the same months as before. Also the shape of percentage of users who tweeted positively below is almost the same, except by the beginning months (October through February). And the percentage of positivism is higher when compared to the first plot we have plotted.

Percentage of positive by month: [image: sentiment] https://cloud.githubusercontent.com/assets/8547396/8362100/9ad4494a-1b3c-11e5-9d65-0f1bc30d1ccf.png

— Reply to this email directly or view it on GitHub https://github.com/tapilab/ecig-classify/issues/8#issuecomment-115359852 .

ElaineResende commented 9 years ago

Yes, sure. Just to confirm, before I was calculating the percentage of users who tweeted positively like: for x, y in zip(values_all_users,values_positive_users): positive_percentil.append(y*100/float(x))

Where values_all_users is a counter of tweets by month and,
values_positive_users is a counter of the positive tweets

Is that wrong?

ElaineResende commented 9 years ago

Using #pos / (#pos + #neg) and not considering neutral for the final graph we have: sentiment2correct

aronwc commented 9 years ago

Looks right to me. Please add to the paper.

ElaineResende commented 9 years ago

Yes, sure.

I am trying to improve the report right now, correcting some mistakes, trying to improve my writing, and also adding other ideas. I am going to change the template to a better and more organized one, if you don't mind.

tapilab / protest

Re-run using three-class classifier #8

pos / (#pos + #neg)