jxnl / instructor

structured outputs for llms
https://python.useinstructor.com/
MIT License
7.68k stars 612 forks source link

Instructor doesn't support dictionary types #927

Closed thomasahle closed 1 month ago

thomasahle commented 1 month ago

What Model are you using?

Describe the bug When I'm trying to use types, like dict, inside a pydantic model, I get errors from openai_schema_helper.

To Reproduce

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI
import dotenv

class UserInfo(BaseModel):
    name_to_age: dict[str, int] = Field(description="The users name and age")

dotenv.load_dotenv()
client = instructor.patch(OpenAI())

print(UserInfo.model_json_schema())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-4o",
    response_model=UserInfo,
    messages=[
        {"role": "system", "content": "Please provide the name and ages of the users as a dictionary."},
        {"role": "user", "content": "John Doe is 30 years old. Anne Smith is 25 years old."}],
)

print(user_info.name_to_age)

Expected behavior It should print {"John Doe": 30, "Anne Smith": 25}

Screenshots

Traceback (most recent call last):
  File "example.py", line 23, in <module>
    user_info = client.chat.completions.create(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/instructor/patch.py", line 140, in new_create_sync
    response_model, new_kwargs = handle_response_model(
                                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/instructor/process_response.py", line 239, in handle_response_model
    response_model = openai_schema(response_model)  # type: ignore
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/instructor/function_calls.py", line 498, in openai_schema
    return openai_schema_helper(cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/instructor/function_calls.py", line 475, in openai_schema_helper
    new_field_type = openai_schema_helper(field_type)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.12/site-packages/instructor/function_calls.py", line 491, in openai_schema_helper
    raise ValueError(f"Unsupported Class of {cls}!")
ValueError: Unsupported Class of dict[str, int]!

Notes:

The Json Schema for the type in the example is:

{
    'properties': {
        'name_to_age': {
            'additionalProperties': {'type': 'integer'},
            'description': 'The users name and age',
            'title': 'Name To Age',
            'type': 'object'
         }
      },
     'required': ['name_to_age'],
     'title': 'UserInfo',
      'type': 'object'
}

I tried all of the following types, and they all fail with similar errors to the one above:


class Answer1(BaseModel):
    answer: dict[str, int]

class Answer2(BaseModel):
    answer: dict[str, str]

class Answer3(BaseModel):
    answer: list[list[float]]

class Answer4(BaseModel):
    answer: Union[int, str, float]

class Answer5(BaseModel):
    answer: Set[str]

class Answer6(BaseModel):
    answer: Tuple[int, float, str]

class Answer7(BaseModel):
    answer: Optional[dict[str, list[int]]]

class Answer8(BaseModel):
    answer: list[Optional[float]]

class Answer9(BaseModel):
    answer: dict[str, Optional[list[Tuple[int, str]]]]

class Answer10(BaseModel):
    answer: list[Union[str, dict[str, int]]]
thomasahle commented 1 month ago

It also seems types on the form int | str are not supported, while Union[int, str]. Note this is not an issue with Pydantic or json_schema, since these both support the types fine.

Also, if I use a recursive type, like

class Answer(BaseModel):
    text: str
    answers: list["Answer"]
Answer.update_forward_refs()

The openai_schema_helper function goes into an infinite recurison.

ivanleomk commented 1 month ago

@thomasahle I just released instructor 1.4.0, this should fix this issue. Could you try it and give it a shot? Note that for the dict[str|any], I tested and it seems like only JSON mode is able to generate the right response.

I feel like that's an issue with the tool calling implementation itself ( Since it might just not know how to match the type ) but will look into it later in the week when I get more time.

thomasahle commented 1 month ago

I ran the original code above, but got InstructorRetryException: RetryError[<Future at 0x11c0c50d0 state=finished raised ValidationError>].

I tried to change response_model=UserInfo to dict[str, int], but got

File /opt/homebrew/lib/python3.12/site-packages/instructor/process_response.py:227, in handle_response_model(response_model, mode, **kwargs)
    225     iterable_element_class = get_args(response_model)[0]
    226     response_model = IterableModel(iterable_element_class)
--> 227 if not issubclass(response_model, OpenAISchema):
    228     response_model = openai_schema(response_model)  # type: ignore
    230 if new_kwargs.get("stream", False) and not issubclass(
    231     response_model, (IterableBase, PartialBase)
    232 ):

File <frozen abc>:123, in __subclasscheck__(cls, subclass)

TypeError: issubclass() arg 1 must be a class

Does it work for you?

ivanleomk commented 1 month ago

@thomasahle , if you use the original code, I found that it works nicely with the JSON mode if you change the client as such. That should fix the RetryError. Not sure why honestly.

client = instructor.from_openai(OpenAI(), mode=instructor.Mode.JSON)

In terms of the support for dict[str,int], we can probably add a guard for it in the next release but it's not a great way to prompt the model for a response. Tool Calling tends to benefit more from a structured output so having something like the iterable below or a list of User objects has consistently worked for me.

import instructor
from pydantic import BaseModel, Field
from openai import OpenAI
import dotenv

class User(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI(), mode=instructor.Mode.TOOLS_STRICT)

# Extract structured data from natural language
users = client.chat.completions.create_iterable(
    model="gpt-4o-mini",
    response_model=User,
    messages=[
        {
            "role": "system",
            "content": "Please provide the name and ages of the users as a dictionary.",
        },
        {
            "role": "user",
            "content": "John Doe is 30 years old. Anne Smith is 25 years old.",
        },
    ],
)

for user in users:
    print(user)

See primitives we support at https://python.useinstructor.com/concepts/types

ivanleomk commented 1 month ago

Closing this issue since the original problem of an unsupported dictionary field in the pydantic model was resolved