Open lukyanov opened 3 weeks ago
I've noticed the more complicated a pydantic object is, especially with nesting, it can throw the model off. One way to combat this is to add more detail into the description for the category field, e.g. "If you can't determine which category, use 'other'". I've also seen, but have not tried, adding concrete few shot examples as schema extras in pydantic.
If anyone has a more thorough solution, I'm definitely interested, as I run into this issue a lot as well.
this might be an issue of
1) few shots 2) better prompting bth
from typing import List, Literal, Optional
from pydantic import BaseModel, Field
import instructor
import tenacity
import logging
from openai import OpenAI
from enum import Enum
openai_client = instructor.patch(OpenAI(), mode=instructor.Mode.TOOLS)
class PurchasedItem(BaseModel):
name: str
category: Literal[
"drinks",
"home and living",
"clothing and shoes",
"other"
] = Field(description="Correctly assign one of the predefined categories to the item.")
class MaybeReceipt(BaseModel):
result: Optional[list[PurchasedItem]] = Field(default=None, description="If the text represents a receipt, this field will contain the extracted items.")
error: bool = Field(default=False)
message: Optional[str] = Field(default=None)
def extract_receipt_data(text):
global openai_client
receipt_data = openai_client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=MaybeReceipt,
max_retries=tenacity.Retrying(
stop=tenacity.stop_after_attempt(1),
after=lambda s: logging.info(f"After: {s}"),
),
messages=[{
"role": "user",
"content": f"Find purchased items in the following text: {text}"
}],
)
return receipt_data
if __name__ == "__main__":
text = """
Today I bought: a table, a chair, a laptop and a beer.
"""
result = extract_receipt_data(text)
print(result.model_dump_json(indent=2))
Adding a description to result and remove some nesting from the pydantic model seems to make it work
Well, I don't want to remove nesting :) I only provided a minimal reproducible example. In reality, I have many fields on the ReceiptModel level.
What Model are you using?
Describe the bug I try to categorize items with a simple Literal list. It works as expected unless I add "maybe" pattern described in the docs.
To Reproduce This code works as expected:
Output:
However, as soon as I add
MaybeReceipt
, the model stops using the categories defined in Literal:Output:
The behavior is the same for all OpenAI models.