sophieball / toxicity-detector

MIT License
0 stars 0 forks source link

Changed politeness histogram plots and the way we present n-grams #73

Closed sophieball closed 4 years ago

CaptainEmerson commented 4 years ago

The z-scores don't get sorted correctly in my ngram column:

abs(z-score)
1.127727465
1.127727465
1.127727465
0.9593232635
0.9267823327
0.8228805307
0.7318181857
0.6578762715
0.6578762715
0.357885226
3.196124698
2.247262102
2.211874137
2.055848259
1.803939237
1.803939237
1.803275636
1.774687938
1.694652204
1.54177375

Or maybe the formatting is wrong? Maybe some exponents are getting dropped?

sophieball commented 4 years ago

The magic numbers make it seem like it would break easily if the input changes

It kinda came from here: http://languagelog.ldc.upenn.edu/myl/Monroe.pdf (Figure 2)

@bvasiles

CaptainEmerson commented 4 years ago

I can approve this once the sorting issue is resolved.

sophieball commented 4 years ago

image

I've been trying macroaveraging. Counts of politeness strategies divided by # sentences.. so now every strategy is divided by # sentences

CaptainEmerson commented 4 years ago

I've added the results for G data in the Google drive. The unigrams are a bit weirder than they were before, including "and" as the top indicator of pushback.