openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
15.11k stars 2.62k forks source link

Fix the is_chat_model function to work with gpt-4o #1550

Open LoryPack opened 3 months ago

LoryPack commented 3 months ago

Currently, evals attempts calling gpt-4o models via the Completion API, which is however incorrect. Simply adding gpt-4o to the list of models to be called with the Chat API fixes that.

Final checklist 👀

Submission agreement

By contributing to Evals, you are agreeing to make your evaluation logic and data under the same MIT license as this repository. You must have adequate rights to upload any data used in an Eval. OpenAI reserves the right to use this data in future service improvements to our product. Contributions to OpenAI Evals will be subject to our usual Usage Policies (https://platform.openai.com/docs/usage-policies).

Email address validation

If your submission is accepted, we will be granting GPT-4 access to a limited number of contributors. Access will be given to the email address associated with the commits on the merged pull request.

Limited availability acknowledgment

We know that you might be excited to contribute to OpenAI's mission, help improve our models, and gain access to GPT-4. However, due to the requirements mentioned above and the high volume of submissions, we will not be able to accept all submissions and thus not grant everyone who opens a PR GPT-4 access. We know this is disappointing, but we hope to set the right expectation before you open this PR.

Submit eval

Failure to fill out all required fields will result in the PR being closed.