openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.35k stars 2.54k forks source link

Fix Unit Test Failures in OpenAI, Anthropic, and Google Gemini Resolvers #1537

Open sakher opened 1 week ago

sakher commented 1 week ago

Fixed all errors in #1536

This pull request addresses multiple failures in the unit tests for the libraries used in OpenAI Assistants, Anthropic, and Google. Here are the key changes made to resolve these issues:

Bug Description:

  1. Anthropic Library:

    • Issue: The type ContentBlock has been updated to a Union type which cannot be instantiated. It now represents both TextBlock and ToolUseBlock.
      ContentBlock = Annotated[Union[TextBlock, ToolUseBlock], PropertyInfo(discriminator="type")]
    • Fix: Replaced ContentBlock with TextBlock in the relevant sections of the code.
  2. OpenAI Library:

    • Issue: The assistant API has breaking changes:
      • retrieval tool renamed to file_search
      • assistant.file_ids parameter changed to tool_resources
      • message.file_ids parameter changed to attachments
    • Fix: Updated the code to reflect these changes.
  3. Gemini Library:

To Reproduce: Run the unit tests for version 3.0.1 to observe the failures.

Changes Made:

Environment:

This pull request ensures that all unit tests pass successfully, and the libraries are compatible with the latest updates and changes.

image