Add 2 toxic content classification datasets

mainlp / awesome-human-label-variation

A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, accompanying The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation (EMNLP 2022)

76 stars 8 forks source link

Add 2 toxic content classification datasets #1

Closed paul-rottger closed 2 years ago

paul-rottger commented 2 years ago

Adding two awesome datasets with human label variation (+sociodemographics!) for toxic content classification :) The Kumar et al. (2021) dataset is the one used by Gordon et al. (2022) for their Jury Learning paper!

bplank commented 2 years ago

Hi Paul. Awesome, thanks for the contribution!