Currently only the mean of the metric is displayed in the UI when repetitions > 1. It would be nice to have variance/std.
Motivation
Many LLM APIs (including OpenAI) can struggle to give deterministic completions. It would be nice to see the stability of metrics without looking at the individual repetitions.
Feature request
Currently only the mean of the metric is displayed in the UI when repetitions > 1. It would be nice to have variance/std.
Motivation
Many LLM APIs (including OpenAI) can struggle to give deterministic completions. It would be nice to see the stability of metrics without looking at the individual repetitions.