issues
search
openai
/
evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
15.09k
stars
2.62k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Request for Global Memory Across Different Chats (Across Sessions)
#1570
rodrigoreis
opened
1 week ago
0
Vector maping, origins, 2-3-4D definitions.
#1569
Really-69
opened
2 weeks ago
1
Update realtime solver to use retries
#1568
liPatrick
closed
2 weeks ago
0
Project installation fails: `tensorflow` conflicting dependencies
#1567
djbb7
opened
2 weeks ago
0
ERROR: Failed building wheel for numpy. clang error compiler does not support 'faltivec
#1566
jinchi2013
opened
2 weeks ago
1
Allow gpt-4o, gpt-4o-mini, and ft:gpt-4o-mini-.* completions
#1565
dacox
closed
1 month ago
0
AttributeError: module 'openai' has no attribute 'error'
#1564
sahilrajput03
opened
1 month ago
2
Unable to install via `pip install evals`
#1563
sahilrajput03
closed
1 month ago
1
Is Evals repo being replaced by the Evaluations feature in the Playground?
#1562
sakher
opened
1 month ago
1
Ice linguistic benchmark
#1561
bjarkiarmanns
opened
1 month ago
0
20240930 steven exception handling usage tokens
#1560
sjadler2004
closed
1 month ago
0
Text2code2video eval
#1559
bhack
opened
2 months ago
0
Add support for new models (gpt-4o, o1-preview and o1-mini)
#1558
sakher
opened
2 months ago
0
o1 release breaks token usage stats
#1556
lucapericlp
opened
2 months ago
0
Bugfixing completion stats break with new reasoning tokens release
#1555
lucapericlp
opened
2 months ago
2
anthropic_solver.py
#1554
iHuydang
opened
2 months ago
1
Update anthropic_solver.py
#1553
iHuydang
closed
2 months ago
0
Add new Eval
#1552
dr-salman-ahmad
closed
2 months ago
1
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini
#1551
RobinWitch
opened
3 months ago
0
Fix the is_chat_model function to work with gpt-4o
#1550
LoryPack
opened
3 months ago
0
GPU solver fix: make sure larger batch sizes are being used
#1549
farzadab
closed
3 months ago
0
Added Icelandic QA evaluation data from news texts
#1548
thorunna
opened
3 months ago
0
Added Icelandic QA evaluation data from Wikipedia
#1547
thorunna
opened
3 months ago
0
Updating make-me-say to be compatible with Solvers
#1546
lennart-finke
opened
3 months ago
1
Fix Information exposure alert through an exception #1543
#1545
arpitjain099
opened
3 months ago
0
Fix log injection error
#1544
arpitjain099
opened
3 months ago
0
Information exposure alert through an exception
#1543
arpitjain099
opened
3 months ago
0
Log injection alert
#1542
arpitjain099
opened
3 months ago
0
[WIP] Multi-GPU solver
#1541
farzadab
closed
3 months ago
0
v1
#1540
juberti
closed
3 months ago
1
Remove global OpenAI client initialization
#1539
michaelAlvarino
opened
4 months ago
0
Ssomsak
#1538
SsomsakTH
closed
5 months ago
0
Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers
#1537
sakher
opened
5 months ago
1
Multiple Unit Test Failures Across OpenAI Assistants, Anthropic, and Google Gemini Libraries
#1536
sakher
opened
5 months ago
0
Update registry.py
#1535
syusuke9999
closed
5 months ago
3
Fix problematic sample in Schelling Point
#1534
JunShern
opened
6 months ago
0
Schelling point eval doesn't work
#1533
johny-b
opened
6 months ago
1
TensorFlow fails while no TensorFlow expected to run at all
#1532
artkpv
closed
6 months ago
1
Update README: Add Langtrace as an Eval vendor
#1531
karthikscale3
opened
6 months ago
0
Add support for gpt-4o
#1530
androettop
opened
6 months ago
2
Support for GPT-4o
#1529
PrashantDixit0
closed
6 months ago
3
[eval] Add IMO problems with exact answers
#1528
justinlinw
closed
4 months ago
1
What is this
#1527
DXv-3
opened
6 months ago
1
Dependabot configuration to update actions in workflows
#1526
ScottBrenner
closed
5 months ago
1
Release 3.0.1
#1525
etr2460
closed
6 months ago
0
Make the torch dep optional
#1524
etr2460
closed
6 months ago
0
show evals in wandb weave
#1522
yogeshg
opened
7 months ago
0
Bump actions/checkout to v4 in GitHub Actions workflows
#1521
ScottBrenner
closed
7 months ago
0
Release 3.0.0
#1520
etr2460
closed
7 months ago
0
Unpin dependencies
#1519
hauntsaninja
closed
7 months ago
0
Next