Proposal: Add free-form data fields to Instance

Background

The current schema for Instance is very limited - it allows a single Input and a List[Reference]. This is appropriate for most question answering scenarios and traditional NLP tasks, but other scenarios contain additional structured data for each instance.

Examples:

AIR-Bench, HarmBench, SimpleSafetyTests needs to store the taxonomy of categories. It currently puts these in references.
BBQ needs to store whether the question is ambiguous or not. It currently puts this in the tags of every reference (the same "ambiguous" / "non-ambiguous" tag is duplicated across references).
XSTest needs to store whether the question is safe or not. It currently uses the last reference for this.
MedAlign in #3038 needs to store the EHR context separately from the question, because it needs to truncate the EHR context in a special way before it is interpolated into the prompt.
MTBench, which is not yet implemented, needs to store the category of the instance, which is used to determine the prompt for the LLM-as-judge.

Proposal

Add the field data: Dict[str, Any] to Instance. The value of data can contain nested dictionaries and arrays, but only contains leaf str and number values, and should be serializable to JSON.

Scenarios can use data however they wish, e.g. setting data to {"category": ""} or {"ehr": "...", "question": "..."}

The frontend can traverse the data object and render a list of key value pairs, where the keys are the JSON path.

Open Questions

Should there be two separate fields, data and metadata?
Should be the field be named extra_data instead of data?

Alternatives Considered

Allow subclassing Input. The subclass can define new fields needed by the scenario. This allows better type-checking than the proposal. However, this doesn't work because subclasses of Input are not preserved after round-trip to JSON using dacite or cattrs. The extra fields defined in the subclass will be removed after the round trip. We need to perform this round trip in various places e.g. helm-summarize.
Make Instance itself a free-form dict. We have too many existing Instance field accesses, so this would require too many changes and potentially break other things.

stanford-crfm / helm