hoangquochung1110 / public-notes

0 stars 0 forks source link

Comprehensive Guide to Enforcing Structured Outputs from LLMs #31

Open hoangquochung1110 opened 2 months ago

hoangquochung1110 commented 2 months ago

Comprehensive Guide to Enforcing Structured Outputs from LLMs

Introduction

Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce stru

Comprehensive Guide to Enforcing Structured Outputs from LLMs

Introduction

Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce structured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.

Overview of Techniques

Prompt Engineering Approaches

Technical Approaches

Deep Dive: Output Parsers

Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.

How Output Parsers Work

  1. Definition Phase: Define a schema that specifies the expected structure (fields, types, constraints)
  2. Extraction Phase: Process the LLM's text output to extract structured data
  3. Validation Phase: Validate extracted data against the schema
  4. Error Handling: Implement retry strategies, fallbacks, or corrections for validation failures

Types of Output Parsers

Regex-based Parsers

import re

def parse_person_regex(text):
    name_match = re.search(r"Name: (.*?)(\n|$)", text)
    age_match = re.search(r"Age: (\d+)", text)

    return {
        "name": name_match.group(1) if name_match else None,
        "age": int(age_match.group(1)) if age_match else None
    }

Grammar-based Parsers

# Using the Lark parser library
from lark import Lark, Transformer

person_grammar = """
    start: person
    person: "Person:" NAME "Age:" AGE "Skills:" skills
    skills: SKILL ("," SKILL)*
    NAME: /[a-zA-Z ]+/
    AGE: /\d+/
    SKILL: /[a-zA-Z]+/
    %import common.WS
    %ignore WS
"""

class PersonTransformer(Transformer):
    def start(self, items):
        return items[0]

    def person(self, items):
        return {"name": items[0], "age": int(items[1]), "skills": items[2]}

    def skills(self, items):
        return items

parser = Lark(person_grammar, start="start", transformer=PersonTransformer())

def parse_with_grammar(text):
    try:
        return parser.parse(text)
    except Exception as e:
        print(f"Parsing error: {e}")
        return None

JSON/XML Parsers

import json
from jsonschema import validate

# Define JSON schema
person_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["name", "age", "skills"]
}

def parse_json_output(text):
    try:
        # Extract JSON from potential surrounding text
        # This regex finds the first JSON object in the text
        import re
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        if json_match:
            json_str = json_match.group(0)
            data = json.loads(json_str)

            # Validate against schema
            validate(instance=data, schema=person_schema)
            return data
    except json.JSONDecodeError:
        print("Failed to parse JSON")
    except Exception as e:
        print(f"Validation error: {e}")

    return None

Pydantic/Schema Validators

from pydantic import BaseModel, Field, validator
from typing import List, Optional

class Skill(BaseModel):
    name: str
    level: str = "beginner"

    @validator("level")
    def validate_level(cls, v):
        valid_levels = ["beginner", "intermediate", "expert"]
        if v.lower() not in valid_levels:
            return "beginner"
        return v.lower()

class Person(BaseModel):
    name: str
    age: int = Field(..., gt=0, lt=150)
    skills: List[Skill]
    contact: Optional[str] = None

def parse_person_data(llm_output: str) -> Person:
    try:
        # Basic extraction (simplified)
        import json
        data = json.loads(llm_output)
        # Validate against schema
        return Person(**data)
    except Exception as e:
        print(f"Parsing error: {e}")
        return None

Advantages of Output Parsers

Limitations of Output Parsers

Deep Dive: Function Calling / Tool Use

Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.

How Function Calling Works

  1. Function Definition: Define functions with clear parameter schemas (typically in JSON Schema format)
  2. Invocation: Prompt the LLM with these function definitions to generate compatible outputs
  3. Execution: Use the structured output to call actual functions in your application

Implementation Approaches

Native Function Calling

# With OpenAI's function calling
import openai

# Define the function schema
functions = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

# API call with function definitions
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
    ],
    functions=functions,
    function_call={"name": "create_person"}
)

# Extract structured function call parameters
function_args = json.loads(response.choices[0].message.function_call.arguments)
print(function_args)

Anthropic Example (Claude 3)

from anthropic import Anthropic

client = Anthropic()
tools = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
    ],
    tools=tools
)

# Process tool calls from response
for tool_call in response.content:
    if tool_call.type == "tool_call":
        print(tool_call.name)  # Function name
        print(tool_call.input)  # Function parameters

Structured Tool Use Frameworks

from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor, create_structured_chat_agent
from pydantic import BaseModel, Field
from typing import List

# Define schema with Pydantic
class PersonCreateInput(BaseModel):
    name: str = Field(description="Person's full name")
    age: int = Field(description="Person's age in years")
    skills: List[str] = Field(description="List of person's skills")
    contact: str = Field(description="Contact information", default=None)

# Create the actual function that will be called
def create_person(name: str, age: int, skills: List[str], contact: str = None):
    person = {"name": name, "age": age, "skills": skills}
    if contact:
        person["contact"] = contact
    return f"Created person: {person}"

# Define the tool
create_person_tool = StructuredTool.from_function(
    name="create_person",
    description="Create a new person record",
    func=create_person,
    args_schema=PersonCreateInput
)

# Set up LLM and agent
llm = ChatOpenAI(temperature=0)
tools = [create_person_tool]
agent = create_structured_chat_agent(llm, tools, verbose=True)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run the agent
result = agent_executor.run(
    "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)

Agent Frameworks

from langgraph.graph import END, StateGraph
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from typing import Dict, List, Annotated, TypedDict
import json

# Define state
class AgentState(TypedDict):
    messages: Annotated[List, "The messages in the conversation"]
    person_data: Annotated[Dict, "The person data being built"]

# Define tool functions
def create_person(state: AgentState, name: str, age: int, skills: List[str], contact: str = None):
    person = {"name": name, "age": age, "skills": skills}
    if contact:
        person["contact"] = contact
    state["person_data"] = person
    return state

def save_person(state: AgentState):
    # In a real application, would save to database
    print(f"Saved person: {state['person_data']}")
    return state

# Define nodes
def agent(state: AgentState) -> AgentState:
    messages = state["messages"]
    llm = ChatOpenAI()
    response = llm.invoke(messages)
    state["messages"].append(response)
    return state

def router(state: AgentState):
    last_message = state["messages"][-1]
    if "create person" in last_message.content.lower():
        return "create_person"
    elif "save" in last_message.content.lower():
        return "save_person"
    else:
        return "agent"

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent)
workflow.add_node("create_person", create_person)
workflow.add_node("save_person", save_person)

# Add edges
workflow.add_edge("agent", router)
workflow.add_edge("create_person", "agent")
workflow.add_edge("save_person", END)

# Compile graph
app = workflow.compile()

# Run the workflow
app.invoke({
    "messages": [HumanMessage(content="Create a profile for a developer named John")],
    "person_data": {}
})

Advantages of Function Calling

Limitations of Function Calling

Comparison: Output Parsers vs. Function Calling

Feature | Output Parsers | Function Calling -- | -- | -- Model Requirements | Works with any LLM | Requires models with function calling capability Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation) Reliability | Medium (depends on prompt engineering) | High (built-in constraints) Flexibility | More flexible for varied outputs | More rigid, follows schema strictly Error Handling | Post-processing | During generation Integration | Separate generation and parsing steps | Direct integration with application functions Cost | Lower (no extra context) | Higher (function definitions use tokens) Maintenance | Regular updates to parsers | Less frequent updates needed

Best Practices for Both Approaches

Schema Design

Prompt Engineering

Error Handling

Implementation

Hybrid Approach Example

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import json

# 1. Define schema with Pydantic
class Person(BaseModel):
    name: str
    age: int = Field(gt=0, lt=150)
    skills: List[str]
    contact: Optional[str] = None

    @validator("skills")
    def validate_skills(cls, v):
        if not v:
            return ["general"]
        return v

# 2. Set up parser
parser = PydanticOutputParser(pydantic_object=Person)

# 3. Create prompt template
template = """
Create a person profile based on the description below.

{format_instructions}

Description: {description}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["description"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# 4. Setup function calling as primary approach
functions = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

# 5. Define the hybrid approach
def create_structured_person(description):
    # Try function calling first
    try:
        llm = ChatOpenAI(temperature=0, model="gpt-4")
        response = llm.invoke(
            [{"role": "user", "content": description}],
            functions=functions,
            function_call={"name": "create_person"}
        )

        if hasattr(response, "function_call"):
            # Extract function call data
            function_args = json.loads(response.function_call.arguments)
            # Validate with pydantic
            return Person(**function_args)
    except Exception as e:
        print(f"Function calling failed: {e}")

    # Fallback to output parsing approach
    try:
        formatted_prompt = prompt.format(description=description)
        llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
        output = llm.invoke(formatted_prompt)

        # Try to parse the output
        return parser.parse(output.content)
    except Exception as e:
        print(f"Output parsing failed: {e}")

    # Last resort: return a minimal valid object
    return Person(name="Unknown", age=30, skills=["general"])

# Example usage
person = create_structured_person(
    "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
print(person)

Advanced Techniques

Type Coercion and Normalization

from pydantic import BaseModel, Field, validator
from typing import Union, List
from datetime import datetime

class Event(BaseModel):
    name: str
    date: Union[datetime, str]
    attendees: Union[List[str], str, int]

    @validator("date", pre=True)
    def parse_date(cls, v):
        if isinstance(v, datetime):
            return v
        if isinstance(v, str):
            try:
                # Try multiple date formats
                for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%d-%m-%Y", "%B %d, %Y"]:
                    try:
                        return datetime.strptime(v, fmt)
                    except ValueError:
                        continue
                # If all formats fail, use a default
                return datetime.now()
            except Exception:
                return datetime.now()
        return datetime.now()

    @validator("attendees", pre=True)
    def parse_attendees(cls, v):
        if isinstance(v, list):
            return v
        if isinstance(v, str):
            # Handle comma-separated string
            if "," in v:
                return [name.strip() for name in v.split(",")]
            # Handle space-separated string
            return [name.strip() for name in v.split()]
        if isinstance(v, int):
            # Handle just a count
            return [f"Attendee {i+1}" for i in range(v)]
        return []

Extraction Techniques for Embedded Structures

import re

def extract_table_from_text(text):
    # Find lines that look like table rows
    rows = []
    current_table_lines = []
    in_table = False

    for line in text.split("\n"):
        # Check if line has pipe separators like a markdown table
        if "|" in line and not line.strip().startswith("<!--"):
            if not in_table:
                in_table = True
            current_table_lines.append(line)
        elif in_table and line.strip() == "":
            # Empty line ends the table
            if current_table_lines:
                rows.extend(current_table_lines)
                current_table_lines = []
            in_table = False

    # Add any remaining table lines
    if current_table_lines:
        rows.extend(current_table_lines)

    # Parse the table rows
    parsed_rows = []
    for row in rows:
        # Skip separator rows (----)
        if re.match(r'^\s*[\-\|]+\s*$', row):
            continue

        # Extract cells from the row
        cells = [cell.strip() for cell in row.split("|")]
        # Remove empty cells at start/end (from leading/trailing |)
        if cells and cells[0] == "":
            cells = cells[1:]
        if cells and cells[-1] == "":
            cells = cells[:-1]

        if cells:
            parsed_rows.append(cells)

    # Create structured data
    if not parsed_rows:
        return []

    # Use first row as headers
    headers = parsed_rows[0]
    data = []

    for row in parsed_rows[1:]:
        # Ensure row has same length as headers by padding if needed
        while len(row) < len(headers):
            row.append("")
        # Truncate if too long
        row = row[:len(headers)]

        # Create dict from row
        data.append(dict(zip(headers, row)))

    return data

Dynamic Schema Generation

from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, create_model
import json
from typing import Dict, Any, List, Union, Optional

def generate_schema_from_description(description: str) -> BaseModel:
    """Generate a Pydantic model from a natural language description."""

    prompt = f"""
    Create a JSON schema for: {description}

    Include appropriate types (string, integer, number, boolean, array, object)
    and required fields. The output should be a valid JSON schema object.
    """

    llm = ChatOpenAI(temperature=0, model="gpt-4")
    response = llm.invoke(prompt)

    try:
        # Extract JSON schema from response
        schema_text = response.content
        schema_match = re.search(r'\{.*\}', schema_text, re.DOTALL)
        if schema_match:
            schema_json = json.loads(schema_match.group(0))
        else:
            schema_json = json.loads(schema_text)

        # Create field definitions for Pydantic model
        fields = {}
        annotations = {}

        for field_name, field_def in schema_json.get("properties", {}).items():
            field_type = field_def.get("type", "string")
            description = field_def.get("description", "")

            # Map JSON schema types to Python types
            type_mapping = {
                "string": str,
                "integer": int,
                "number": float,
                "boolean": bool,
                "array": List[Any],
                "object": Dict[str, Any]
            }

            python_type = type_mapping.get(field_type, Any)

            # Handle arrays with specific item types
            if field_type == "array" and "items" in field_def:
                items_type = field_def["items"].get("type", "string")
                if items_type in type_mapping:
                    python_type = List[type_mapping[items_type]]

            # Set as optional if not required
            required_fields = schema_json.get("required", [])
            if field_name not in required_fields:
                python_type = Optional[python_type]

            # Add field to model definition
            annotations[field_name] = python_type
            fields[field_name] = (python_type, Field(description=description))

        # Create the model
        model_name = "DynamicModel"
        DynamicModel = create_model(model_name, **fields)
        return DynamicModel

    except Exception as e:
        print(f"Failed to generate schema: {e}")
        # Return a basic model as fallback
        return create_model("FallbackModel", content=(Dict[str, Any], ...))

Conclusion

Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.

Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.

As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.

For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.

ctured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.

Overview of Techniques

Prompt Engineering Approaches

Technical Approaches

Deep Dive: Output Parsers

Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.

How Output Parsers Work

  1. Definition Phase: Define a schema that specifies the expected structure (fields, types, constraints)
  2. Extraction Phase: Process the LLM's text output to extract structured data
  3. Validation Phase: Validate extracted data against the schema
  4. Error Handling: Implement retry strategies, fallbacks, or corrections for validation failures

Types of Output Parsers

Regex-based Parsers

import re

def parse_person_regex(text):
    name_match = re.search(r"Name: (.*?)(\n|$)", text)
    age_match = re.search(r"Age: (\d+)", text)

    return {
        "name": name_match.group(1) if name_match else None,
        "age": int(age_match.group(1)) if age_match else None
    }

Grammar-based Parsers

# Using the Lark parser library
from lark import Lark, Transformer

person_grammar = """
    start: person
    person: "Person:" NAME "Age:" AGE "Skills:" skills
    skills: SKILL ("," SKILL)*
    NAME: /[a-zA-Z ]+/
    AGE: /\d+/
    SKILL: /[a-zA-Z]+/
    %import common.WS
    %ignore WS
"""

class PersonTransformer(Transformer):
    def start(self, items):
        return items[0]

    def person(self, items):
        return {"name": items[0], "age": int(items[1]), "skills": items[2]}

    def skills(self, items):
        return items

parser = Lark(person_grammar, start="start", transformer=PersonTransformer())

def parse_with_grammar(text):
    try:
        return parser.parse(text)
    except Exception as e:
        print(f"Parsing error: {e}")
        return None

JSON/XML Parsers

import json
from jsonschema import validate

# Define JSON schema
person_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        }
    },
    "required": ["name", "age", "skills"]
}

def parse_json_output(text):
    try:
        # Extract JSON from potential surrounding text
        # This regex finds the first JSON object in the text
        import re
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        if json_match:
            json_str = json_match.group(0)
            data = json.loads(json_str)

            # Validate against schema
            validate(instance=data, schema=person_schema)
            return data
    except json.JSONDecodeError:
        print("Failed to parse JSON")
    except Exception as e:
        print(f"Validation error: {e}")

    return None

Pydantic/Schema Validators

from pydantic import BaseModel, Field, validator
from typing import List, Optional

class Skill(BaseModel):
    name: str
    level: str = "beginner"

    @validator("level")
    def validate_level(cls, v):
        valid_levels = ["beginner", "intermediate", "expert"]
        if v.lower() not in valid_levels:
            return "beginner"
        return v.lower()

class Person(BaseModel):
    name: str
    age: int = Field(..., gt=0, lt=150)
    skills: List[Skill]
    contact: Optional[str] = None

def parse_person_data(llm_output: str) -> Person:
    try:
        # Basic extraction (simplified)
        import json
        data = json.loads(llm_output)
        # Validate against schema
        return Person(**data)
    except Exception as e:
        print(f"Parsing error: {e}")
        return None

Advantages of Output Parsers

Limitations of Output Parsers

Deep Dive: Function Calling / Tool Use

Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.

How Function Calling Works

  1. Function Definition: Define functions with clear parameter schemas (typically in JSON Schema format)
  2. Invocation: Prompt the LLM with these function definitions to generate compatible outputs
  3. Execution: Use the structured output to call actual functions in your application

Implementation Approaches

Native Function Calling

# With OpenAI's function calling
import openai

# Define the function schema
functions = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

# API call with function definitions
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
    ],
    functions=functions,
    function_call={"name": "create_person"}
)

# Extract structured function call parameters
function_args = json.loads(response.choices[0].message.function_call.arguments)
print(function_args)

Anthropic Example (Claude 3)

from anthropic import Anthropic

client = Anthropic()
tools = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "input_schema": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1000,
    messages=[
        {"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
    ],
    tools=tools
)

# Process tool calls from response
for tool_call in response.content:
    if tool_call.type == "tool_call":
        print(tool_call.name)  # Function name
        print(tool_call.input)  # Function parameters

Structured Tool Use Frameworks

from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor, create_structured_chat_agent
from pydantic import BaseModel, Field
from typing import List

# Define schema with Pydantic
class PersonCreateInput(BaseModel):
    name: str = Field(description="Person's full name")
    age: int = Field(description="Person's age in years")
    skills: List[str] = Field(description="List of person's skills")
    contact: str = Field(description="Contact information", default=None)

# Create the actual function that will be called
def create_person(name: str, age: int, skills: List[str], contact: str = None):
    person = {"name": name, "age": age, "skills": skills}
    if contact:
        person["contact"] = contact
    return f"Created person: {person}"

# Define the tool
create_person_tool = StructuredTool.from_function(
    name="create_person",
    description="Create a new person record",
    func=create_person,
    args_schema=PersonCreateInput
)

# Set up LLM and agent
llm = ChatOpenAI(temperature=0)
tools = [create_person_tool]
agent = create_structured_chat_agent(llm, tools, verbose=True)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run the agent
result = agent_executor.run(
    "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)

Agent Frameworks

from langgraph.graph import END, StateGraph
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from typing import Dict, List, Annotated, TypedDict
import json

# Define state
class AgentState(TypedDict):
    messages: Annotated[List, "The messages in the conversation"]
    person_data: Annotated[Dict, "The person data being built"]

# Define tool functions
def create_person(state: AgentState, name: str, age: int, skills: List[str], contact: str = None):
    person = {"name": name, "age": age, "skills": skills}
    if contact:
        person["contact"] = contact
    state["person_data"] = person
    return state

def save_person(state: AgentState):
    # In a real application, would save to database
    print(f"Saved person: {state['person_data']}")
    return state

# Define nodes
def agent(state: AgentState) -> AgentState:
    messages = state["messages"]
    llm = ChatOpenAI()
    response = llm.invoke(messages)
    state["messages"].append(response)
    return state

def router(state: AgentState):
    last_message = state["messages"][-1]
    if "create person" in last_message.content.lower():
        return "create_person"
    elif "save" in last_message.content.lower():
        return "save_person"
    else:
        return "agent"

# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent)
workflow.add_node("create_person", create_person)
workflow.add_node("save_person", save_person)

# Add edges
workflow.add_edge("agent", router)
workflow.add_edge("create_person", "agent")
workflow.add_edge("save_person", END)

# Compile graph
app = workflow.compile()

# Run the workflow
app.invoke({
    "messages": [HumanMessage(content="Create a profile for a developer named John")],
    "person_data": {}
})

Advantages of Function Calling

Limitations of Function Calling

Comparison: Output Parsers vs. Function Calling

Feature | Output Parsers | Function Calling -- | -- | -- Model Requirements | Works with any LLM | Requires models with function calling capability Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation) Reliability | Medium (depends on prompt engineering) | High (built-in constraints) Flexibility | More flexible for varied outputs | More rigid, follows schema strictly Error Handling | Post-processing | During generation Integration | Separate generation and parsing steps | Direct integration with application functions Cost | Lower (no extra context) | Higher (function definitions use tokens) Maintenance | Regular updates to parsers | Less frequent updates needed

Best Practices for Both Approaches

Schema Design

Prompt Engineering

Error Handling

Implementation

Hybrid Approach Example

from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import json

# 1. Define schema with Pydantic
class Person(BaseModel):
    name: str
    age: int = Field(gt=0, lt=150)
    skills: List[str]
    contact: Optional[str] = None

    @validator("skills")
    def validate_skills(cls, v):
        if not v:
            return ["general"]
        return v

# 2. Set up parser
parser = PydanticOutputParser(pydantic_object=Person)

# 3. Create prompt template
template = """
Create a person profile based on the description below.

{format_instructions}

Description: {description}
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["description"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

# 4. Setup function calling as primary approach
functions = [
    {
        "name": "create_person",
        "description": "Create a new person record",
        "parameters": {
            "type": "object",
            "properties": {
                "name": {"type": "string", "description": "Person's full name"},
                "age": {"type": "integer", "description": "Person's age in years"},
                "skills": {
                    "type": "array", 
                    "items": {"type": "string"},
                    "description": "List of person's skills"
                },
                "contact": {"type": "string", "description": "Contact information"}
            },
            "required": ["name", "age", "skills"]
        }
    }
]

# 5. Define the hybrid approach
def create_structured_person(description):
    # Try function calling first
    try:
        llm = ChatOpenAI(temperature=0, model="gpt-4")
        response = llm.invoke(
            [{"role": "user", "content": description}],
            functions=functions,
            function_call={"name": "create_person"}
        )

        if hasattr(response, "function_call"):
            # Extract function call data
            function_args = json.loads(response.function_call.arguments)
            # Validate with pydantic
            return Person(**function_args)
    except Exception as e:
        print(f"Function calling failed: {e}")

    # Fallback to output parsing approach
    try:
        formatted_prompt = prompt.format(description=description)
        llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
        output = llm.invoke(formatted_prompt)

        # Try to parse the output
        return parser.parse(output.content)
    except Exception as e:
        print(f"Output parsing failed: {e}")

    # Last resort: return a minimal valid object
    return Person(name="Unknown", age=30, skills=["general"])

# Example usage
person = create_structured_person(
    "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
print(person)

Advanced Techniques

Type Coercion and Normalization

from pydantic import BaseModel, Field, validator
from typing import Union, List
from datetime import datetime

class Event(BaseModel):
    name: str
    date: Union[datetime, str]
    attendees: Union[List[str], str, int]

    @validator("date", pre=True)
    def parse_date(cls, v):
        if isinstance(v, datetime):
            return v
        if isinstance(v, str):
            try:
                # Try multiple date formats
                for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%d-%m-%Y", "%B %d, %Y"]:
                    try:
                        return datetime.strptime(v, fmt)
                    except ValueError:
                        continue
                # If all formats fail, use a default
                return datetime.now()
            except Exception:
                return datetime.now()
        return datetime.now()

    @validator("attendees", pre=True)
    def parse_attendees(cls, v):
        if isinstance(v, list):
            return v
        if isinstance(v, str):
            # Handle comma-separated string
            if "," in v:
                return [name.strip() for name in v.split(",")]
            # Handle space-separated string
            return [name.strip() for name in v.split()]
        if isinstance(v, int):
            # Handle just a count
            return [f"Attendee {i+1}" for i in range(v)]
        return []

Extraction Techniques for Embedded Structures

import re

def extract_table_from_text(text):
    # Find lines that look like table rows
    rows = []
    current_table_lines = []
    in_table = False

    for line in text.split("\n"):
        # Check if line has pipe separators like a markdown table
        if "|" in line and not line.strip().startswith("<!--"):
            if not in_table:
                in_table = True
            current_table_lines.append(line)
        elif in_table and line.strip() == "":
            # Empty line ends the table
            if current_table_lines:
                rows.extend(current_table_lines)
                current_table_lines = []
            in_table = False

    # Add any remaining table lines
    if current_table_lines:
        rows.extend(current_table_lines)

    # Parse the table rows
    parsed_rows = []
    for row in rows:
        # Skip separator rows (----)
        if re.match(r'^\s*[\-\|]+\s*$', row):
            continue

        # Extract cells from the row
        cells = [cell.strip() for cell in row.split("|")]
        # Remove empty cells at start/end (from leading/trailing |)
        if cells and cells[0] == "":
            cells = cells[1:]
        if cells and cells[-1] == "":
            cells = cells[:-1]

        if cells:
            parsed_rows.append(cells)

    # Create structured data
    if not parsed_rows:
        return []

    # Use first row as headers
    headers = parsed_rows[0]
    data = []

    for row in parsed_rows[1:]:
        # Ensure row has same length as headers by padding if needed
        while len(row) < len(headers):
            row.append("")
        # Truncate if too long
        row = row[:len(headers)]

        # Create dict from row
        data.append(dict(zip(headers, row)))

    return data

Dynamic Schema Generation

from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, create_model
import json
from typing import Dict, Any, List, Union, Optional

def generate_schema_from_description(description: str) -> BaseModel:
    """Generate a Pydantic model from a natural language description."""

    prompt = f"""
    Create a JSON schema for: {description}

    Include appropriate types (string, integer, number, boolean, array, object)
    and required fields. The output should be a valid JSON schema object.
    """

    llm = ChatOpenAI(temperature=0, model="gpt-4")
    response = llm.invoke(prompt)

    try:
        # Extract JSON schema from response
        schema_text = response.content
        schema_match = re.search(r'\{.*\}', schema_text, re.DOTALL)
        if schema_match:
            schema_json = json.loads(schema_match.group(0))
        else:
            schema_json = json.loads(schema_text)

        # Create field definitions for Pydantic model
        fields = {}
        annotations = {}

        for field_name, field_def in schema_json.get("properties", {}).items():
            field_type = field_def.get("type", "string")
            description = field_def.get("description", "")

            # Map JSON schema types to Python types
            type_mapping = {
                "string": str,
                "integer": int,
                "number": float,
                "boolean": bool,
                "array": List[Any],
                "object": Dict[str, Any]
            }

            python_type = type_mapping.get(field_type, Any)

            # Handle arrays with specific item types
            if field_type == "array" and "items" in field_def:
                items_type = field_def["items"].get("type", "string")
                if items_type in type_mapping:
                    python_type = List[type_mapping[items_type]]

            # Set as optional if not required
            required_fields = schema_json.get("required", [])
            if field_name not in required_fields:
                python_type = Optional[python_type]

            # Add field to model definition
            annotations[field_name] = python_type
            fields[field_name] = (python_type, Field(description=description))

        # Create the model
        model_name = "DynamicModel"
        DynamicModel = create_model(model_name, **fields)
        return DynamicModel

    except Exception as e:
        print(f"Failed to generate schema: {e}")
        # Return a basic model as fallback
        return create_model("FallbackModel", content=(Dict[str, Any], ...))

Conclusion

Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.

Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.

As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.

For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.