stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.92k stars 246 forks source link

include URL to paper for each scenario group #964

Closed percyliang closed 2 years ago

percyliang commented 2 years ago

Include a URL link to the paper or website in the description field of each scenario group so that people can click through to read more about the scenario(s). We can just assume that string value is a markdown string, which we can turn into HTML in the frontend using showdown or something.

rishibommasani commented 2 years ago

Should this be a part of a description or a new field for Scenario? I am fine with either, just not sure what will be easier/happy to do whatever is simpler

E.g. how should we format for BBQ?

Current description: "Bias analysis of LM question-answering." URL: https://crfm.stanford.edu/assets/helm.pdf#bbq (the URL doesn't exist yet, I am just putting as a placeholder)

percyliang commented 2 years ago

A Scenario is an artifact of implementation, not necessarily a conceptually meaningful unit - for example, multiple commonsense QA datasets belong to one Scenario... So I think it should be in the scenario group. We could consider removing the description for a Scenario.

percyliang commented 2 years ago

Just to be clear, I meant the original paper/website of the scenario/dataset. We can also link to the appropriate section in our paper, but I just want to make sure we're acknowledging the constituent datasets appropriately.

rishibommasani commented 2 years ago

Oh I misunderstood, thanks for clarifying on the latter point - I am fully on board with giving visibility to the source.

rishibommasani commented 2 years ago

To the former comment:

  1. That makes sense to me.
  2. I think removing descriptions makes sense to me. Anyways, I think the descriptions we actually want are annotated for scenario groups in the schema (i.e. they are generally more standardized and of higher quality than the stuff in the various scenario.py files.
rishibommasani commented 2 years ago

@percyliang I think this is addressed by the final two comments in PR #976.

See https://github.com/stanford-crfm/benchmarking/pull/976/commits/14d6bdb4af25acb7816c6f9f4fa1193b6480d57c

What do you think?

percyliang commented 2 years ago

Yup!

rishibommasani commented 2 years ago

Fantastic, closing! (I responded to your comment in the commit, was addressed in the next commit and should have linked to both above, though its impressive you were so keen to notice the issue :) )