Let the user running the Test decide where to run the Annotator

mlcommons / modelgauge

Make it easy to automatically and uniformly measure the behavior of many AI Systems.

https://mlcommons.org/ai-safety/

Apache License 2.0

26 stars 7 forks source link

Let the user running the Test decide where to run the Annotator #255

Open brianwgoldman opened 7 months ago

brianwgoldman commented 7 months ago

Currently each Test specifies which Annotator it wants to run, with full control over where that Annotator is hosted. However, someone running the Test might want to select a different host.

For example, if a Test uses LlamaGuardAnnotator, that always runs on together.ai. However, it may be that the person running that Test has some private hosting of LlamaGuard they'd prefer to use to save money/time/be more private. The current framework can do this, but it's hacky:

Create a new Test class that inherits from the original Test.
Override get_annotators to return your substitute.

This hack may also require you to override the Test's __init__ to change what secrets it asks for, and you might have to define a new Annotator that allows swapping out how it makes the calls.

We should find a more straightforward solution.

brianwgoldman commented 7 months ago

Jotting down some initial thoughts:

We could have get_annotators report the Annotator interface they want, then have the runner match that up with the Annotators provided in the run.
- If get_annotators is an instance method, that means we'd need to instantiate the Test to know if we had the right set of provided Annotators. If it is a class method, we lose some flexibility on Tests.
- How do we handle configuration of the Annotator. For example, maybe this test wants to use a different prompt for LlamaGuard?
We could have Tests take the SUT used by the Annotator in its __init__. We could then use dependency injection for make_instance on the interface required for the SUT.
- Dependency injection could get really messy if a Test has two Annotators that could potentially use two different SUTs.

wpietri commented 7 months ago

Would it help if we expressed the concept of "provider"? Especially with the open-source models, it seems like there are going to be a lot of ways to run them. We need more users to be sure, but it seems like a reasonable request is, "Instead of the default TogetherAI provider, I'd like to use my existing [Google, Amazon, Azure, HuggingFace, etc, etc] account for common models." We'd obviously have gaps in the cross product, and there's a problem with name mapping. But maybe we could cover common use cases with a single command-line switch, and let people do hackier things for the rarer ones?