"Maybe" pattern spoils categorization?

lukyanov commented 3 weeks ago

[x] This is actually a bug report.
[ ] I am not getting good LLM Results
[ ] I have tried asking for help in the community on discord or discussions and have not received a response.
[x] I have tried searching the documentation and have not found an answer.

What Model are you using?

[x] gpt-3.5-turbo
[x] gpt-4-turbo
[x] gpt-4
[ ] Other (please specify)

Describe the bug I try to categorize items with a simple Literal list. It works as expected unless I add "maybe" pattern described in the docs.

To Reproduce This code works as expected:

from typing import List, Literal, Optional
from pydantic import BaseModel, Field
import instructor
import tenacity
import logging
from openai import OpenAI
from enum import Enum

openai_client = instructor.patch(OpenAI(), mode=instructor.Mode.TOOLS)

class PurchasedItem(BaseModel):
    name: str
    category: Literal[
        "drinks",
        "home and living",
        "clothing and shoes",
        "other"
    ] = Field(description="Correctly assign one of the predefined categories to the item.")

class ReceiptModel(BaseModel):
    items: List[PurchasedItem]

def extract_receipt_data(text):
    global openai_client

    receipt_data = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=ReceiptModel,
        max_retries=tenacity.Retrying(
            stop=tenacity.stop_after_attempt(1),
            after=lambda s: logging.info(f"After: {s}"),
        ),
        messages=[{
            "role": "user",
            "content": f"Find purchased items in the following text: {text}"
        }],
    )
    return receipt_data

if __name__ == "__main__":
    text = """
    Today I bought: a table, a chair, a laptop and a beer.
    """
    result = extract_receipt_data(text)
    print(result.model_dump_json(indent=2))

Output:

{
  "items": [
    {
      "name": "table",
      "category": "home and living"
    },
    {
      "name": "chair",
      "category": "home and living"
    },
    {
      "name": "laptop",
      "category": "other"
    },
    {
      "name": "beer",
      "category": "drinks"
    }
  ]
}

However, as soon as I add MaybeReceipt, the model stops using the categories defined in Literal:

from typing import List, Literal, Optional
from pydantic import BaseModel, Field
import instructor
import tenacity
import logging
from openai import OpenAI
from enum import Enum

openai_client = instructor.patch(OpenAI(), mode=instructor.Mode.TOOLS)

class PurchasedItem(BaseModel):
    name: str
    category: Literal[
        "drinks",
        "home and living",
        "clothing and shoes",
        "other"
    ] = Field(description="Correctly assign one of the predefined categories to the item.")

class ReceiptModel(BaseModel):
    items: List[PurchasedItem]

class MaybeReceipt(BaseModel):
    result: Optional[ReceiptModel] = Field(default=None)
    error: bool = Field(default=False)
    message: Optional[str] = Field(default=None)

def extract_receipt_data(text):
    global openai_client

    receipt_data = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=MaybeReceipt,
        max_retries=tenacity.Retrying(
            stop=tenacity.stop_after_attempt(1),
            after=lambda s: logging.info(f"After: {s}"),
        ),
        messages=[{
            "role": "user",
            "content": f"Find purchased items in the following text: {text}"
        }],
    )
    return receipt_data

if __name__ == "__main__":
    text = """
    Today I bought: a table, a chair, a laptop and a beer.
    """
    result = extract_receipt_data(text)
    print(result.model_dump_json(indent=2))

Output:

pydantic_core._pydantic_core.ValidationError: 4 validation errors for MaybeReceipt
result.items.0.category
  Input should be 'drinks', 'home and living', 'clothing and shoes' or 'other' [type=literal_error, input_value='furniture', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/literal_error
result.items.1.category
  Input should be 'drinks', 'home and living', 'clothing and shoes' or 'other' [type=literal_error, input_value='furniture', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/literal_error
result.items.2.category
  Input should be 'drinks', 'home and living', 'clothing and shoes' or 'other' [type=literal_error, input_value='electronics', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/literal_error
result.items.3.category
  Input should be 'drinks', 'home and living', 'clothing and shoes' or 'other' [type=literal_error, input_value='food', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/literal_error

The behavior is the same for all OpenAI models.

economy commented 3 weeks ago

I've noticed the more complicated a pydantic object is, especially with nesting, it can throw the model off. One way to combat this is to add more detail into the description for the category field, e.g. "If you can't determine which category, use 'other'". I've also seen, but have not tried, adding concrete few shot examples as schema extras in pydantic.

If anyone has a more thorough solution, I'm definitely interested, as I run into this issue a lot as well.

jxnl commented 2 weeks ago

this might be an issue of

1) few shots 2) better prompting bth

GabrielGimenez commented 2 weeks ago

from typing import List, Literal, Optional
from pydantic import BaseModel, Field
import instructor
import tenacity
import logging
from openai import OpenAI
from enum import Enum

openai_client = instructor.patch(OpenAI(), mode=instructor.Mode.TOOLS)

class PurchasedItem(BaseModel):
    name: str
    category: Literal[
        "drinks",
        "home and living",
        "clothing and shoes",
        "other"
    ] = Field(description="Correctly assign one of the predefined categories to the item.")

class MaybeReceipt(BaseModel):
    result: Optional[list[PurchasedItem]] = Field(default=None, description="If the text represents a receipt, this field will contain the extracted items.")
    error: bool = Field(default=False)
    message: Optional[str] = Field(default=None)

def extract_receipt_data(text):
    global openai_client

    receipt_data = openai_client.chat.completions.create(
        model="gpt-3.5-turbo",
        response_model=MaybeReceipt,
        max_retries=tenacity.Retrying(
            stop=tenacity.stop_after_attempt(1),
            after=lambda s: logging.info(f"After: {s}"),
        ),
        messages=[{
            "role": "user",
            "content": f"Find purchased items in the following text: {text}"
        }],
    )
    return receipt_data

if __name__ == "__main__":
    text = """
    Today I bought: a table, a chair, a laptop and a beer.
    """
    result = extract_receipt_data(text)
    print(result.model_dump_json(indent=2))

Adding a description to result and remove some nesting from the pydantic model seems to make it work

lukyanov commented 2 weeks ago

Well, I don't want to remove nesting :) I only provided a minimal reproducible example. In reality, I have many fields on the ReceiptModel level.

jxnl / instructor

"Maybe" pattern spoils categorization? #760