mlcommons / modelgauge

Make it easy to automatically and uniformly measure the behavior of many AI Systems.
https://mlcommons.org/ai-safety/
Apache License 2.0
26 stars 7 forks source link

Let the user running the Test decide where to run the Annotator #255

Open brianwgoldman opened 7 months ago

brianwgoldman commented 7 months ago

Currently each Test specifies which Annotator it wants to run, with full control over where that Annotator is hosted. However, someone running the Test might want to select a different host.

For example, if a Test uses LlamaGuardAnnotator, that always runs on together.ai. However, it may be that the person running that Test has some private hosting of LlamaGuard they'd prefer to use to save money/time/be more private. The current framework can do this, but it's hacky:

This hack may also require you to override the Test's __init__ to change what secrets it asks for, and you might have to define a new Annotator that allows swapping out how it makes the calls.

We should find a more straightforward solution.

brianwgoldman commented 7 months ago

Jotting down some initial thoughts:

wpietri commented 7 months ago

Would it help if we expressed the concept of "provider"? Especially with the open-source models, it seems like there are going to be a lot of ways to run them. We need more users to be sure, but it seems like a reasonable request is, "Instead of the default TogetherAI provider, I'd like to use my existing [Google, Amazon, Azure, HuggingFace, etc, etc] account for common models." We'd obviously have gaps in the cross product, and there's a problem with name mapping. But maybe we could cover common use cases with a single command-line switch, and let people do hackier things for the rarer ones?