simonw / llm-evals-plugin

Run evals using LLM
19 stars 0 forks source link

Design and document checks and plugin hook #6

Open simonw opened 2 months ago

simonw commented 2 months ago

So people can see how they can implement their own checks.

I'll also release the first checks plugin, for SQLite query execution, mainly as a demo.

simonw commented 2 months ago

Current hook spec: https://github.com/simonw/llm-evals-plugin/blob/e427453a1995fdc9d2270581f33851c256bc0c3a/llm_evals/hookspecs.py#L6-L8

Example usage:

https://github.com/simonw/llm-evals-plugin/blob/e427453a1995fdc9d2270581f33851c256bc0c3a/llm_evals/checks.py#L37-L46

 @llm.hookimpl 
 def register_eval_checks(): 
     return [Iexact]
simonw commented 2 months ago

I think there should probably be a Check class that all checks subclass. It could handle error reporting perhaps?

I'm also tempted to make checks use assert internally - something like this perhaps:

assert return self.text.lower() == response.text().lower(), "Lowercase match failed"
simonw commented 2 months ago

There could be a llm-evals-unsafe-python plugin which adds a check which looks like this:

checks:
- unsafe_python: assert response.text() == "Unsafely checked here"

(Is the "unsafe" designation really necessary here? I feel like it is, because I want to discourage people from running llm evals https://url-to-some-yaml-file against unstrusted files that could execute arbitrary code if they are running unsafe plugins. But maybe having unsafe in the name of the plugin is enough warning there?)

simonw commented 2 months ago

Another option there is that unsafe plugins could mark themselves as such and then you have to run llm evals ... --unsafe to execute them.

simonw commented 2 months ago

I want a llm-evals-quickjs plugin which WILL be safe because it will support checks like this one which run in a sandbox:

checks:
- quickjs: |
    if (response.text() !== "expected") {
      throw new Error("not expected");
    }