langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
88.65k stars 13.93k forks source link

Structured output with ChatOpenAI is not working when structure class has a list of strings #22332

Closed rgallardone closed 1 month ago

rgallardone commented 1 month ago

Checked other resources

Example Code

from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: List[str] = Field(description="List of jokes to tell the user")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats")

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to get a structured output using a chain with ChatOpenAI. I reproduced the behavior with this very simple scenario:

from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: List[str] = Field(description="List of jokes to tell the user")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats")

I expected the result to be a list of jokes, but it didn't work, even for this very simple prompt. If I change the code a little bit like this:

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: str = Field(description="List of jokes to tell the user, separated by a semicolon")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats, and split them with a semicolon")

I get the following output:

{'jokes': "Why don't cats play poker in the jungle? Too many cheetahs.;What do you call a cat that throws all the most expensive parties? The Great Catsby.;Why did the cat sit on the computer? To keep an eye on the mouse."}

Obviously, the model knows how to solve such a simple task, but it doesn't seem to be using the structure correctly when it has a list of strings as the attribute.

I tried the same behavior with more complex structured outputs, and the same happened.

System Info

langchain==0.2.1 langchain-community==0.2.1 langchain-core==0.2.3 langchain-experimental==0.0.59 langchain-openai==0.1.8 langchain-text-splitters==0.2.0

platform: linux python version: 3.10.10

wulifu2hao commented 1 month ago

This example could be fixed by changing from pydantic import BaseModel, Field to from langchain_core.pydantic_v1 import BaseModel, Field

This issue #16564 seems related

rgallardone commented 1 month ago

Thanks!!! This solved it. Dumb mistake on my side. Much appreciated!