issues
search
openai
/
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.33k
stars
2.54k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Ssomsak
#1538
SsomsakTH
closed
4 days ago
0
Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers
#1537
sakher
opened
1 week ago
0
Multiple Unit Test Failures Across OpenAI Assistants, Anthropic, and Google Gemini Libraries
#1536
sakher
opened
1 week ago
0
Update registry.py
#1535
syusuke9999
closed
1 week ago
3
Fix problematic sample in Schelling Point
#1534
JunShern
opened
1 month ago
0
Schelling point eval doesn't work
#1533
johny-b
opened
1 month ago
1
TensorFlow fails while no TensorFlow expected to run at all
#1532
artkpv
closed
1 month ago
1
Update README: Add Langtrace as an Eval vendor
#1531
karthikscale3
opened
1 month ago
0
Add support for gpt-4o
#1530
androettop
opened
1 month ago
0
Support for GPT-4o
#1529
PrashantDixit0
closed
1 month ago
3
[eval] Add IMO problems with exact answers
#1528
justinlinw
opened
1 month ago
1
What is this
#1527
DXv-3
opened
2 months ago
1
Dependabot configuration to update actions in workflows
#1526
ScottBrenner
closed
2 weeks ago
1
Release 3.0.1
#1525
etr2460
closed
2 months ago
0
Make the torch dep optional
#1524
etr2460
closed
2 months ago
0
show evals in wandb weave
#1522
yogeshg
opened
2 months ago
0
Bump actions/checkout to v4 in GitHub Actions workflows
#1521
ScottBrenner
closed
2 months ago
0
Release 3.0.0
#1520
etr2460
closed
2 months ago
0
Unpin dependencies
#1519
hauntsaninja
closed
2 months ago
0
Test
#1518
robertprast
closed
2 months ago
0
Allow for evals with no args
#1517
thesofakillers
closed
2 months ago
0
Relax version constraint for `playwright` module
#1516
danesherbs
closed
3 months ago
0
Getting started example doesn't work - oieval attempts to update a None type object
#1515
jswang
closed
2 months ago
1
Switch from pyzstd to zstandard
#1514
josnyder-2
closed
3 months ago
0
When installing the project dependencies, i got: "ERROR: Could not build wheels for greenlet, which is required to install pyproject.toml-based projects"
#1513
JuanmaMenendez
closed
3 months ago
3
Remove citation prediction eval
#1512
ojaffe
closed
2 months ago
0
Added Quran Eval & Simple Fact Model-Graded Definition
#1511
sakher
opened
3 months ago
7
Add Classification Rule Articulation Eval
#1510
danesherbs
opened
3 months ago
0
eval pattern-concat-logic
#1508
natanaelwf
opened
3 months ago
1
Update ReadMe with New Cookbook link
#1507
royziv11
closed
3 months ago
0
Updates on existing solvers and bugged tool eval
#1506
ojaffe
closed
3 months ago
0
Fix specifying API arguments from the CLI
#1505
LoryPack
opened
3 months ago
0
Setting completion function args via CLI does not work
#1504
LoryPack
opened
3 months ago
0
Add Gemini Solver
#1503
ojaffe
closed
3 months ago
0
TogetherSolver
#1502
thesofakillers
closed
3 months ago
0
Unified create_retrying for all solvers
#1501
ojaffe
closed
3 months ago
0
Add Multi-Step Web Tasks
#1500
danesherbs
closed
3 months ago
0
Add 20 questions eval
#1499
inwaves
closed
3 months ago
0
AnthropicSolver
#1498
thesofakillers
closed
3 months ago
0
Add skill acquisition eval
#1497
inwaves
closed
3 months ago
0
Add Human-Relative MLAgentBench
#1496
danesherbs
closed
3 months ago
0
[Evals] Add eval for Dhivehi diacritical marks
#1495
aanaseer
opened
3 months ago
0
Add `**kwargs` to `OpenAIChatCompletionFn`
#1494
ezraporter
opened
3 months ago
0
`OpenAIChatCompletionFn` should `__init__` should accept `**kwargs`
#1493
ezraporter
opened
3 months ago
0
Add Function Deduction eval
#1492
james-aung
closed
3 months ago
0
Add In-Context RL eval
#1491
james-aung
closed
3 months ago
1
Already Said That Eval
#1490
thesofakillers
closed
3 months ago
0
Track the Stat Eval
#1489
thesofakillers
closed
3 months ago
0
Identifying Variables Eval
#1488
thesofakillers
closed
3 months ago
0
Can't Do That Anymore Eval
#1487
ojaffe
closed
3 months ago
0
Next