openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.36k stars 2.55k forks source link

Drop two datasets from steganography #1477

Closed thesofakillers closed 3 months ago

thesofakillers commented 3 months ago

Removing two datasets:

Impact on Steganography:

JunShern commented 3 months ago

Piggybacking this PR to add a small fix for the OpenAIAssistantsSolver which was causing tests to fail. https://github.com/openai/evals/pull/1477/commits/0360e4c44610b972dfb004f15071753d7e6525ff

thesofakillers commented 3 months ago

Closing this due to merge issues