wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
https://wandb.me/weave
Apache License 2.0
645 stars 48 forks source link

chore(weave): EvalCompare: Fixes issues when models are same version in scorecard section #1903

Closed tssweeney closed 1 month ago

tssweeney commented 1 month ago

Example: https://app.wandb.test/l2k2/hellaswag/weave/compare-evaluations?evaluationCallIds=%5B%22ff014327-f0b8-4959-a828-2a617b2969a4%22%2C%22403b9bd6-eda4-4908-b79d-06b68cac33c0%22%5D

When comparisons share the same model version, the scorecard was not showing different values

circle-job-mirror[bot] commented 1 month ago

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=1b779341eed79585407864065deffb2a84e2f7b0