openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.76k stars 2.58k forks source link

Multiple Unit Test Failures Across OpenAI Assistants, Anthropic, and Google Gemini Libraries #1536

Open sakher opened 3 months ago

sakher commented 3 months ago

Describe the bug

I observed several failures in unit tests involving the libraries for OpenAI Assistants, Anthropic, and Google Gemini. Each, detailed below.

  1. Anthropic Library:

    • Issue: Recent changes have made the ContentBlock type a Union type, which can't be directly instantiated. This updated type now encapsulates both TextBlock and ToolUseBlock.
      ContentBlock = Annotated[Union[TextBlock, ToolUseBlock], PropertyInfo(discriminator="type")]
    • Impact: Our existing code relies on the direct instantiation of ContentBlock, which is now causing failures.
  2. OpenAI Library:

    • Issue: The latest update to the assistant API introduced several breaking changes:
      • The retrieval tool has been renamed to file_search.
      • The parameter assistant.file_ids has been changed to tool_resources.
      • The parameter message.file_ids has been modified to attachments.
    • Impact: These changes are causing failures in the functionalities that depend on file handling and assistant resources.
  3. Gemini Library:

    • Issue: There's a defect in the Gemini library related to how it handles protobuf objects; specifically, it erroneously parses these objects as dictionaries.
    • Impact: This parsing error is causing unexpected behavior and test failures.

To Reproduce

Just run the unit tests for v 3.0.1

Code snippets

No response

OS

macOS

Python version

3.9.0

Library version

3.0.1