Closed tuxpiper closed 8 years ago
Now, assume that for a cluster we have S support, D deny and Q questions and in total, C = S+D+Q. Non-controversiality is given by:
score_of_noncontroversiality = 1/3* ( (S/C - 1/3)^2 + (D/C - 1/3)^2 + (Q/C - 1/3)^2)
If score is large, theme is non-controversial. If score is close to 0, theme is controversial.
This is based on the chi-squared test for uniform discrete distributions. I will write a formal argument when needed.
Here you can see some example scores, that I ran on some 50 themes. You can see how many tweets with deny, support, question each has. They are sorted by controversiality, first the more controversial, last the less controversial. The last column is the score above, sorted increasingly (here be careful, what's a high score and a high controversiality.. it's the other way around :) )
clusteriD support deny question score
1184 22 16 17 0.002277319
1109 5885 4695 3334 0.005610682
1214 12 34 12 0.031972519
1150 15 9 3 0.032921811
106 264 40 142 0.042152913
1142 44 16 9 0.048029126
1070 20 15 1 0.049897119
1149 234 9 191 0.050494218
1195 13 10 0 0.058391094
1228 47 8 15 0.058820862
1023 14 2 5 0.058956916
1051 21 30 0 0.060745867
1002 50 5 16 0.072780974
1096 31 15 0 0.075719387
1076 133 37 10 0.085987654
1062 19 7 0 0.091058514
1012 26 7 1 0.098231449
1232 27 5 2 0.107458670
1024 16 65 1 0.111078062
1071 33 5 3 0.111573799
1167 144 27 5 0.120143193
1043 25 0 5 0.129629630
1050 20 4 0 0.129629630
1041 31 6 0 0.131645159
1018 6 32 0 0.133579563
1225 110 13 5 0.139010959
1162 25 1 3 0.140573391
1159 0 27 4 0.147300266
1025 194 25 3 0.147728810
1058 44 6 0 0.151822222
1164 46 5 1 0.152942143
1217 124 11 1 0.168192522
1165 21 2 0 0.169292166
0 33 2 1 0.170267490
1161 23 2 0 0.173155556
1205 82 6 1 0.173406837
1156 24 1 1 0.173898751
1180 57 3 1 0.180835498
1193 934 58 3 0.183739692
1237 78 3 1 0.190990812
1192 1 42 1 0.192952250
1091 1 61 1 0.201562106
1230 280 7 0 0.206358649
1200 3 329 0 0.216252560
1042 7101 1 0 0.222128365
119 65 0 0 0.222222222
1047 40 0 0 0.222222222
1075 110 0 0 0.222222222
1118 83 0 0 0.222222222
1136 80 0 0 0.222222222
We need to revisit the formula so that we return controversiality
So, good news, the score seems to be bounded, between 0 and 0.222222... (which is 2/9 actually). In order to turn the score around, just say: 1 - 9/2*score . It will be 0 for non-controversial and 1 for most controversial. Tadaaa.. math magic :)
Oh great, thanks @lauratolosi !
The following will give you the number of support, deny, question for each cluster. If you want, you can specify the cluster that you want to compute them for.
PREFIX pheme: http://www.pheme.eu/ontology/pheme# PREFIX xsd: http://www.w3.org/2001/XMLSchema# select ?eventId ?support (count(?support) as ?count) where {
?a a pheme:Tweet . ?a pheme:eventId ?eventId. ?a pheme:sdq ?support . ?a pheme:version "v7" . FILTER (xsd:integer(?eventId) > -1) . } group by ?eventId ?support order by ?eventId limit 100