openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.35k stars 2.54k forks source link

Schelling point eval doesn't work #1533

Open johny-b opened 1 month ago

johny-b commented 1 month ago

Fresh installation (d3dc89042ddee879a68a326fdb37716ee518640c)

oaieval gpt-3.5-turbo schelling_point

dies around 580th sample with

(...)
openai.BadRequestError: Error code: 400 - {'error': {'message': "Sorry! We've encountered an issue with repetitive patterns in your prompt. Please try again with a different prompt.", 'type': 'invalid_prompt', 'param': 'prompt', 'code': None}}

I guess some prompt was accepted by the OpenAI API when the eval was developed and now it's rejected.

JunShern commented 1 month ago

Thanks for flagging! https://github.com/openai/evals/pull/1534 has a fix, which should get merged soon.