promptfoo / promptfoo

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
https://www.promptfoo.dev/
MIT License
3.95k stars 279 forks source link

How to compare two separate eval runs in the Web UI #1025

Closed efung closed 3 weeks ago

efung commented 1 month ago

In the config, if I've defined two prompts, or two providers, I see the side-by-side results in the Web UI.

What about the situation if I or someone else have run an eval on a single prompt or provider, and want to combine our two results and see them in the promptfoo Web UI? Can that be done?

typpo commented 1 month ago

Two thoughts on how to solve this:

efung commented 1 month ago

Purely on the UI side, have a way to select existing evals to include in the view, or

I like this UI solution for the use case of multiple collaborators running their own evals, then wanting to do comparisons among the eval runs.

Merging evals together might have its own use cases, but I think would be cumbersome for browsing.

typpo commented 3 weeks ago

"Compare evals" functionality landed in the UI under Table Settings