A curated list of awesome datasets with human label variation (un-aggregated labels) in Natural Language Processing and Computer Vision, accompanying The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation (EMNLP 2022)
Adding two awesome datasets with human label variation (+sociodemographics!) for toxic content classification :) The Kumar et al. (2021) dataset is the one used by Gordon et al. (2022) for their Jury Learning paper!
Adding two awesome datasets with human label variation (+sociodemographics!) for toxic content classification :) The Kumar et al. (2021) dataset is the one used by Gordon et al. (2022) for their Jury Learning paper!