Structured output with ChatOpenAI is not working when structure class has a list of strings

rgallardone commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: List[str] = Field(description="List of jokes to tell the user")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats")

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to get a structured output using a chain with ChatOpenAI. I reproduced the behavior with this very simple scenario:

from pydantic import BaseModel, Field
from typing import List
from langchain_openai import ChatOpenAI

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: List[str] = Field(description="List of jokes to tell the user")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats")

I expected the result to be a list of jokes, but it didn't work, even for this very simple prompt. If I change the code a little bit like this:

class Jokes(BaseModel):
    """List of jokes to tell user."""

    jokes: str = Field(description="List of jokes to tell the user, separated by a semicolon")

structured_llm = ChatOpenAI(model="gpt-4").with_structured_output(Jokes)

structured_llm.invoke("You MUST tell me more than one joke about cats, and split them with a semicolon")

I get the following output:

{'jokes': "Why don't cats play poker in the jungle? Too many cheetahs.;What do you call a cat that throws all the most expensive parties? The Great Catsby.;Why did the cat sit on the computer? To keep an eye on the mouse."}

Obviously, the model knows how to solve such a simple task, but it doesn't seem to be using the structure correctly when it has a list of strings as the attribute.

I tried the same behavior with more complex structured outputs, and the same happened.

System Info

langchain==0.2.1 langchain-community==0.2.1 langchain-core==0.2.3 langchain-experimental==0.0.59 langchain-openai==0.1.8 langchain-text-splitters==0.2.0

platform: linux python version: 3.10.10

wulifu2hao commented 1 month ago

This example could be fixed by changing from pydantic import BaseModel, Field to from langchain_core.pydantic_v1 import BaseModel, Field

This issue #16564 seems related

rgallardone commented 1 month ago

Thanks!!! This solved it. Dumb mistake on my side. Much appreciated!

langchain-ai / langchain