openai evals issues - Githubissues

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Other

15.09k stars 2.62k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Request for Global Memory Across Different Chats (Across Sessions)

#1570 rodrigoreis opened 1 week ago
0
Vector maping, origins, 2-3-4D definitions.

#1569 Really-69 opened 2 weeks ago
1
Update realtime solver to use retries

#1568 liPatrick closed 2 weeks ago
0
Project installation fails: `tensorflow` conflicting dependencies

#1567 djbb7 opened 2 weeks ago
0
ERROR: Failed building wheel for numpy. clang error compiler does not support 'faltivec

#1566 jinchi2013 opened 2 weeks ago
1
Allow gpt-4o, gpt-4o-mini, and ft:gpt-4o-mini-.* completions

#1565 dacox closed 1 month ago
0
AttributeError: module 'openai' has no attribute 'error'

#1564 sahilrajput03 opened 1 month ago
2
Unable to install via `pip install evals`

#1563 sahilrajput03 closed 1 month ago
1
Is Evals repo being replaced by the Evaluations feature in the Playground?

#1562 sakher opened 1 month ago
1
Ice linguistic benchmark

#1561 bjarkiarmanns opened 1 month ago
0
20240930 steven exception handling usage tokens

#1560 sjadler2004 closed 1 month ago
0
Text2code2video eval

#1559 bhack opened 2 months ago
0
Add support for new models (gpt-4o, o1-preview and o1-mini)

#1558 sakher opened 2 months ago
0
o1 release breaks token usage stats

#1556 lucapericlp opened 2 months ago
0
Bugfixing completion stats break with new reasoning tokens release

#1555 lucapericlp opened 2 months ago
2
anthropic_solver.py

#1554 iHuydang opened 2 months ago
1
Update anthropic_solver.py

#1553 iHuydang closed 2 months ago
0
Add new Eval

#1552 dr-salman-ahmad closed 2 months ago
1
Fix a bug in examples/mmlu.ipynb when using gpt-4o or gpt-4o-mini

#1551 RobinWitch opened 3 months ago
0
Fix the is_chat_model function to work with gpt-4o

#1550 LoryPack opened 3 months ago
0
GPU solver fix: make sure larger batch sizes are being used

#1549 farzadab closed 3 months ago
0
Added Icelandic QA evaluation data from news texts

#1548 thorunna opened 3 months ago
0
Added Icelandic QA evaluation data from Wikipedia

#1547 thorunna opened 3 months ago
0
Updating make-me-say to be compatible with Solvers

#1546 lennart-finke opened 3 months ago
1
Fix Information exposure alert through an exception #1543

#1545 arpitjain099 opened 3 months ago
0
Fix log injection error

#1544 arpitjain099 opened 3 months ago
0
Information exposure alert through an exception

#1543 arpitjain099 opened 3 months ago
0
Log injection alert

#1542 arpitjain099 opened 3 months ago
0
[WIP] Multi-GPU solver

#1541 farzadab closed 3 months ago
0
v1

#1540 juberti closed 3 months ago
1
Remove global OpenAI client initialization

#1539 michaelAlvarino opened 4 months ago
0
Ssomsak

#1538 SsomsakTH closed 5 months ago
0
Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers

#1537 sakher opened 5 months ago
1
Multiple Unit Test Failures Across OpenAI Assistants, Anthropic, and Google Gemini Libraries

#1536 sakher opened 5 months ago
0
Update registry.py

#1535 syusuke9999 closed 5 months ago
3
Fix problematic sample in Schelling Point

#1534 JunShern opened 6 months ago
0
Schelling point eval doesn't work

#1533 johny-b opened 6 months ago
1
TensorFlow fails while no TensorFlow expected to run at all

#1532 artkpv closed 6 months ago
1
Update README: Add Langtrace as an Eval vendor

#1531 karthikscale3 opened 6 months ago
0
Add support for gpt-4o

#1530 androettop opened 6 months ago
2
Support for GPT-4o

#1529 PrashantDixit0 closed 6 months ago
3
[eval] Add IMO problems with exact answers

#1528 justinlinw closed 4 months ago
1
What is this

#1527 DXv-3 opened 6 months ago
1
Dependabot configuration to update actions in workflows

#1526 ScottBrenner closed 5 months ago
1
Release 3.0.1

#1525 etr2460 closed 6 months ago
0
Make the torch dep optional

#1524 etr2460 closed 6 months ago
0
show evals in wandb weave

#1522 yogeshg opened 7 months ago
0
Bump actions/checkout to v4 in GitHub Actions workflows

#1521 ScottBrenner closed 7 months ago
0
Release 3.0.0

#1520 etr2460 closed 7 months ago
0
Unpin dependencies

#1519 hauntsaninja closed 7 months ago
0