Open jowagner opened 1 year ago
@jowagner I found a total of 946 cases. Of which, 'train': 697, 'test': 174, 'dev': 75. See .csv attached.
Columns denote, ind_sexist_count -> number of times labelled as 'sexist';
ind_not-sexist_count -> number of times labelled as 'not sexist'; agg_label -> aggregated label
The task organisers state the following in their paper,
Three annotators labelled each entry. To further ensure label quality, we rely on expert adjudication for disagreements. Experts were called upon to give labels for (i) cases with less than 3/3 agreement (unanimous) in Task A, and (ii) cases with less than 2/3 agreement in Tasks B and C.
https://arxiv.org/pdf/2303.04222.pdf (3.4 Annotator Process)
I do not know how to pose the question for
In our deeper analysis of 10 samples, we found that
sexism2022_english-15683
has the labelnot sexist
but all 3 individual annotators tagged the post assexist
. A possible explanation may be that the annotators disagreed on the fine-grained sexism annotation, this triggered a review of the case and the final decision was that all 3 annotators are wrong about the binary classification.