OpenAI's Evals library is a great resource providing evaluation sets for LLMS.
This repo provides a hub for exploring these results using the Zeno evaluation tool.
To add new evals, add a new entry to evals/evals.yaml
with the following fields:
results-file
: The first .jsonl
result from oaievals
link
: A link to the evals commit for this evaluationdescription
: A succint description of what the evaluation is testingsecond-results-file
: An optional second .jsonl
result from oaievals
. Must be the same dataset as the first one.functions-file
: An optional Python file with Zeno functions for the evaluations.Make sure you test your evals locally before submitting a PR!
poetry install
python -m zeno-evals-hub evals/evals.yaml