openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.76k stars 2.58k forks source link

Improvements to `Match`: case insensitive and strip #1421

Open LoryPack opened 10 months ago

LoryPack commented 10 months ago

Describe the feature or improvement you're requesting

The current implementation of the Match basic eval template is case-sensitive. This leads to results such as:

{'correct': False, 'expected': 'no', 'picked': None, 'sampled': 'No', 'options': ['no']}

Similarly, Match does not strip the sampled string from white spaces in the front. That causes the evaluation to fail for models using the Completion endpoint, as those are more likely to output spaces in the front. Example:

{'correct': False, 'expected': 'Mumbai', 'picked': None, 'sampled': ' Mumbai', 'options': ['Mumbai']}

It would be good to add an argument to Match allowing to require case insensitive behaviour and to determine whether the answer should be stripped of spaces. These can then be specified in the yaml file for a task.

Similar options can be added for the other templates, such as Includes and FuzzyMatch.

Additional context

No response