Add variance/std when displaying evaluations with repetitions

langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations

https://smith.langchain.com/

MIT License

346 stars 59 forks source link

Add variance/std when displaying evaluations with repetitions #768

Open Bradley-Butcher opened 4 weeks ago

Bradley-Butcher commented 4 weeks ago

Feature request

Currently only the mean of the metric is displayed in the UI when repetitions > 1. It would be nice to have variance/std.

Motivation

Many LLM APIs (including OpenAI) can struggle to give deterministic completions. It would be nice to see the stability of metrics without looking at the individual repetitions.

hinthornw commented 4 weeks ago

Ya have been wanting this myself. We'll put it on the roadmap - as of now I can't promise a timeline