stanford-crfm / helm

Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
https://crfm.stanford.edu/helm
Apache License 2.0
1.92k stars 245 forks source link

Proposal: Add free-form data fields to Instance #3057

Open yifanmai opened 2 weeks ago

yifanmai commented 2 weeks ago

Background

The current schema for Instance is very limited - it allows a single Input and a List[Reference]. This is appropriate for most question answering scenarios and traditional NLP tasks, but other scenarios contain additional structured data for each instance.

Examples:

Proposal

Add the field data: Dict[str, Any] to Instance. The value of data can contain nested dictionaries and arrays, but only contains leaf str and number values, and should be serializable to JSON.

Scenarios can use data however they wish, e.g. setting data to {"category": ""} or {"ehr": "...", "question": "..."}

The frontend can traverse the data object and render a list of key value pairs, where the keys are the JSON path.

Open Questions

Alternatives Considered

yifanmai commented 2 days ago

MMLU and GPQA both need this because they both need to store an extra chain-of-thought annotation. See: #3088