simonw / llm-evals-plugin

Run evals using LLM
19 stars 0 forks source link

Ability to store evals in the database and run them from there too #7

Open simonw opened 2 months ago

simonw commented 2 months ago

Could be something like this:

llm evals add simple simple.yml -d evals.db

Then later:

llm evals run simple -m claude-3-sonnet -d evals.db

Without the -d option the default SQLite database for LLM would be used.

simonw commented 2 months ago

Just running an eval from a YAML file (or URL to a YAML file) will save a copy of that eval in the database, so anything you've run once you can run again using just the database that it saved its results to.

This will also help with running evals over time, e.g. to see if the API version of a model gets different results compared to a few months ago.