stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in Holistic Evaluation of Text-to-Image Models (HEIM) (https://arxiv.org/abs/2311.04287).
https://crfm.stanford.edu/helm
Apache License 2.0
1.85k stars 243 forks source link

Add WILDS data for CivilComments #913

Closed rishibommasani closed 1 year ago

rishibommasani commented 1 year ago

We currently use some Kaggle data for CivilComments, but we should use the more authoritative WILDS data (that also merge sparse categories in a sensible way; I guess the other religion category makes sense, but is a bit weird given judaism/hinduism/buddhism are major world religions, though just not that frequent in the data).

We should have a flag to toggle between, and once done, we should switch to WILDS as the default.

teetone commented 1 year ago

@rishibommasani, do we still want to support both, or should we just use the WILDS version? @ryanachi If you don't have the bandwidth, I can take this.

rishibommasani commented 1 year ago

we only need the wilds one