openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
15.11k stars 2.62k forks source link

Updating make-me-say to be compatible with Solvers #1546

Open lennart-finke opened 3 months ago

lennart-finke commented 3 months ago

This PR refactors make-me-say to be compatible with the Solvers API.

Instead of passing three completion functions, the eval is now passed one solver as the con artist and two completion functions as the mark and summary model respectively. We still assume gpt-4-32k and gpt-3.5-turbo-16k as defaults, as #1530 is not yet merged. (Edit: Reviewer suggested adding gpt-4o-mini as default instead and changing the registry ourselves.)


Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

lennart-finke commented 3 months ago

Thanks for the comments @danesherbs! Addressed them and reran tests, ready for further review or merge.