langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.24k stars 14.72k forks source link

Structured output using ChatOllama with nested schemas #25343

Open julian24bas opened 1 month ago

julian24bas commented 1 month ago

Checked other resources

Example Code

from typing import Optional, List
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import ChatOllama

class Person(BaseModel):
    """Information about a person."""

    # ^ Doc-string for the entity Person.
    # This doc-string is sent to the LLM as the description of the schema Person,
    # and it can help to improve extraction results.

    # Note that:
    # 1. Each field is an `optional` -- this allows the model to decline to extract it!
    # 2. Each field has a `description` -- this description is used by the LLM.
    # Having a good description can help improve extraction results.
    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )

class Data(BaseModel):
    """Extracted data about people."""

    # Creates a model so that we can extract multiple entities.
    people: List[Person]

# Define a custom prompt to provide instructions and any additional context.
# 1) You can add examples into the prompt template to improve extraction quality
# 2) Introduce additional parameters to take context into account (e.g., include metadata
#    about the document from which the text was extracted.)
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm. "
            "Only extract relevant information from the text. "
            "If you do not know the value of an attribute asked to extract, "
            "return null for the attribute's value.",
        ),
        # Please see the how-to about improving performance with
        # reference examples.
        # MessagesPlaceholder('examples'),
        ("human", "{text}"),
    ]
)

llm = ChatOllama(model="llama3.1", temperature=0)

runnable = prompt | llm.with_structured_output(schema=Data)

text = "Alan Smith is 6 feet tall and has blond hair. Alan Poe is 3 feet tall and has grey hair."
response = runnable.invoke({"text": text})
print(response)

Error Message and Stack Trace (if applicable)

Traceback (most recent call last): File "/tmp_extraction_setup.py", line 59, in response = runnable.invoke({"text": text}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/xyz_core/runnables/base.py", line 2878, in invoke input = context.run(step.invoke, input, config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/xyz_core/output_parsers/base.py", line 183, in invoke return self._call_with_config( ^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/xyz_core/runnables/base.py", line 1785, in _call_with_config context.run( File "/lib/python3.11/site-packages/xyz_core/runnables/config.py", line 427, in call_func_with_variable_args return func(input, kwargs) # type: ignore[call-arg] ^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/xyz_core/output_parsers/base.py", line 184, in lambda inner_input: self.parse_result( ^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/xyz_core/output_parsers/abc_tools.py", line 300, in parse_result raise e File "/lib/python3.11/site-packages/xyz_core/output_parsers/abc_tools.py", line 295, in parse_result pydantic_objects.append(name_dict[res["type"]](res["args"])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in init raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for Data people value is not a valid list (type=type_error.list)

Process finished with exit code 1

Description

I followed the tutorial using ChatOllama from langchain-ollama and model with tool support like llama3.1 to run it. This works fine when using the class Person to retrieve the information. However when changing the schema to Data I get the error above.

The data which is passed to the pydantic model has this pattern: {'people': '[{"name": "Alan Smith", "height": "6 feet", "hair color": "blond"}, {"name": "Alan Poe", "height": "3 feet", "hair color": "grey"}]'}

As you can see here the keys for hair color and height in meters are false. Also the value for 'people' is a string rather than a list. So there is definitely a problem with the Data class using another pydantic class as attribute.

System Info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 Python Version: 3.11.7 (main, Jan 24 2024, 11:32:46) [Clang 15.0.0 (clang-1500.1.0.2.5)]

Package Information

langchain_core: 0.2.29 langchain: 0.2.12 langchain_community: 0.2.11 langsmith: 0.1.98 langchain_ollama: 0.1.1 langchain_text_splitters: 0.2.2

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.10.2 async-timeout: Installed. No version info available. dataclasses-json: 0.6.7 jsonpatch: 1.33 numpy: 1.26.4 ollama: 0.3.1 orjson: 3.10.7 packaging: 24.1 pydantic: 2.8.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.32 tenacity: 8.5.0 typing-extensions: 4.12.2

corbinklett commented 1 month ago

Related, I've noticed that the new json_schema method with ChatOpenAI's with_structured_output does not support nested TypedDict schemas, and I have also not gotten it to work with Pydantic schemas yet.

You also cannot specify which fields are Required versus NotRequired when using a TypedDict.