Comprehensive Guide to Enforcing Structured Outputs from LLMs
Introduction
Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce stru
Comprehensive Guide to Enforcing Structured Outputs from LLMs
Introduction
Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce structured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.
Overview of Techniques
Prompt Engineering Approaches
Few-shot prompting: Providing examples of the desired output format
System prompts with format instructions: Explicit instructions about structure
XML/JSON templates: Including skeleton templates for the model to fill in
Structured response schemas: Requesting specific fields in a particular order
Chain-of-thought prompting: Breaking down the output generation process step by step
Technical Approaches
Output parsers: Post-processing tools that extract structured data
Function calling / Tool use: Defining functions with parameter schemas
JSON mode: Special configurations optimized for JSON output
Guardrails/validators: Frameworks that validate outputs against schemas
Structured generation libraries: Tools like LangChain, LMQL, etc.
Fine-tuning: Training models on datasets with the desired output format
Deep Dive: Output Parsers
Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.
How Output Parsers Work
Definition Phase: Define a schema that specifies the expected structure (fields, types, constraints)
Extraction Phase: Process the LLM's text output to extract structured data
Validation Phase: Validate extracted data against the schema
Error Handling: Implement retry strategies, fallbacks, or corrections for validation failures
Types of Output Parsers
Regex-based Parsers
Use regular expressions to extract patterns from text
Simple but brittle; breaks with minor format changes
Best for consistent, simple outputs
Example use case: Extracting simple key-value pairs or specific patterns
import re
def parse_person_regex(text):
name_match = re.search(r"Name: (.*?)(\n|$)", text)
age_match = re.search(r"Age: (\d+)", text)
return {
"name": name_match.group(1) if name_match else None,
"age": int(age_match.group(1)) if age_match else None
}
Grammar-based Parsers
Define formal grammars that specify valid output formats
More robust than regex for complex structures
Examples include PEG (Parsing Expression Grammar) parsers
Useful for outputs with nested structures or complex patterns
# Using the Lark parser library
from lark import Lark, Transformer
person_grammar = """
start: person
person: "Person:" NAME "Age:" AGE "Skills:" skills
skills: SKILL ("," SKILL)*
NAME: /[a-zA-Z ]+/
AGE: /\d+/
SKILL: /[a-zA-Z]+/
%import common.WS
%ignore WS
"""
class PersonTransformer(Transformer):
def start(self, items):
return items[0]
def person(self, items):
return {"name": items[0], "age": int(items[1]), "skills": items[2]}
def skills(self, items):
return items
parser = Lark(person_grammar, start="start", transformer=PersonTransformer())
def parse_with_grammar(text):
try:
return parser.parse(text)
except Exception as e:
print(f"Parsing error: {e}")
return None
JSON/XML Parsers
Specialized for extracting structured data formats
Often include schema validation
Common in LLM frameworks like LangChain and LlamaIndex
Particularly useful when the LLM can directly generate well-formed JSON/XML
import json
from jsonschema import validate
# Define JSON schema
person_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"skills": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "age", "skills"]
}
def parse_json_output(text):
try:
# Extract JSON from potential surrounding text
# This regex finds the first JSON object in the text
import re
json_match = re.search(r'\{.*\}', text, re.DOTALL)
if json_match:
json_str = json_match.group(0)
data = json.loads(json_str)
# Validate against schema
validate(instance=data, schema=person_schema)
return data
except json.JSONDecodeError:
print("Failed to parse JSON")
except Exception as e:
print(f"Validation error: {e}")
return None
Pydantic/Schema Validators
Use schema libraries to validate and coerce data
Provide clear error messages for invalid data
Allow for complex nested structures and type validation
Great for Python applications with complex data models
from pydantic import BaseModel, Field, validator
from typing import List, Optional
class Skill(BaseModel):
name: str
level: str = "beginner"
@validator("level")
def validate_level(cls, v):
valid_levels = ["beginner", "intermediate", "expert"]
if v.lower() not in valid_levels:
return "beginner"
return v.lower()
class Person(BaseModel):
name: str
age: int = Field(..., gt=0, lt=150)
skills: List[Skill]
contact: Optional[str] = None
def parse_person_data(llm_output: str) -> Person:
try:
# Basic extraction (simplified)
import json
data = json.loads(llm_output)
# Validate against schema
return Person(**data)
except Exception as e:
print(f"Parsing error: {e}")
return None
Advantages of Output Parsers
Works with any LLM, even without function calling capabilities
Can be customized for specific application needs
Provides a clear separation between generation and parsing concerns
Can be adapted to handle various formats and edge cases
Limitations of Output Parsers
Parsing can fail if LLM output doesn't match expected format
Requires careful prompt engineering to get consistent formats
Error handling and retry logic add complexity
May need regular updates as LLM behavior changes
Deep Dive: Function Calling / Tool Use
Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.
How Function Calling Works
Function Definition: Define functions with clear parameter schemas (typically in JSON Schema format)
Invocation: Prompt the LLM with these function definitions to generate compatible outputs
Execution: Use the structured output to call actual functions in your application
Implementation Approaches
Native Function Calling
Supported directly by models like GPT-4, Claude 3, and Llama 3
The model is aware of function signatures and generates compatible outputs
Example platforms: OpenAI API, Anthropic API, Anyscale Endpoints
# With OpenAI's function calling
import openai
# Define the function schema
functions = [
{
"name": "create_person",
"description": "Create a new person record",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
# API call with function definitions
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
],
functions=functions,
function_call={"name": "create_person"}
)
# Extract structured function call parameters
function_args = json.loads(response.choices[0].message.function_call.arguments)
print(function_args)
Anthropic Example (Claude 3)
from anthropic import Anthropic
client = Anthropic()
tools = [
{
"name": "create_person",
"description": "Create a new person record",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
messages=[
{"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
],
tools=tools
)
# Process tool calls from response
for tool_call in response.content:
if tool_call.type == "tool_call":
print(tool_call.name) # Function name
print(tool_call.input) # Function parameters
Structured Tool Use Frameworks
Libraries like LangChain Tools, CrewAI
Define tools with input/output schemas
Handle routing between multiple possible functions
from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor, create_structured_chat_agent
from pydantic import BaseModel, Field
from typing import List
# Define schema with Pydantic
class PersonCreateInput(BaseModel):
name: str = Field(description="Person's full name")
age: int = Field(description="Person's age in years")
skills: List[str] = Field(description="List of person's skills")
contact: str = Field(description="Contact information", default=None)
# Create the actual function that will be called
def create_person(name: str, age: int, skills: List[str], contact: str = None):
person = {"name": name, "age": age, "skills": skills}
if contact:
person["contact"] = contact
return f"Created person: {person}"
# Define the tool
create_person_tool = StructuredTool.from_function(
name="create_person",
description="Create a new person record",
func=create_person,
args_schema=PersonCreateInput
)
# Set up LLM and agent
llm = ChatOpenAI(temperature=0)
tools = [create_person_tool]
agent = create_structured_chat_agent(llm, tools, verbose=True)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run the agent
result = agent_executor.run(
"Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
Agent Frameworks
Orchestration systems like AutoGPT, LangGraph
Chain multiple function calls together
Allow for complex workflows with decision-making
from langgraph.graph import END, StateGraph
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from typing import Dict, List, Annotated, TypedDict
import json
# Define state
class AgentState(TypedDict):
messages: Annotated[List, "The messages in the conversation"]
person_data: Annotated[Dict, "The person data being built"]
# Define tool functions
def create_person(state: AgentState, name: str, age: int, skills: List[str], contact: str = None):
person = {"name": name, "age": age, "skills": skills}
if contact:
person["contact"] = contact
state["person_data"] = person
return state
def save_person(state: AgentState):
# In a real application, would save to database
print(f"Saved person: {state['person_data']}")
return state
# Define nodes
def agent(state: AgentState) -> AgentState:
messages = state["messages"]
llm = ChatOpenAI()
response = llm.invoke(messages)
state["messages"].append(response)
return state
def router(state: AgentState):
last_message = state["messages"][-1]
if "create person" in last_message.content.lower():
return "create_person"
elif "save" in last_message.content.lower():
return "save_person"
else:
return "agent"
# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent)
workflow.add_node("create_person", create_person)
workflow.add_node("save_person", save_person)
# Add edges
workflow.add_edge("agent", router)
workflow.add_edge("create_person", "agent")
workflow.add_edge("save_person", END)
# Compile graph
app = workflow.compile()
# Run the workflow
app.invoke({
"messages": [HumanMessage(content="Create a profile for a developer named John")],
"person_data": {}
})
Advantages of Function Calling
More reliable structured outputs since the model is constrained
Cleaner integration with application code
Reduced need for complex parsing and error handling
Better validation at generation time
Acts as implicit documentation of expected parameters
Limitations of Function Calling
Not all models support function calling natively
Additional token usage for function definitions
May limit creative outputs when rigid structure is enforced
Model may still occasionally produce invalid outputs
Can be more expensive due to larger context window usage
Comparison: Output Parsers vs. Function Calling
Feature | Output Parsers | Function Calling
-- | -- | --
Model Requirements | Works with any LLM | Requires models with function calling capability
Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation)
Reliability | Medium (depends on prompt engineering) | High (built-in constraints)
Flexibility | More flexible for varied outputs | More rigid, follows schema strictly
Error Handling | Post-processing | During generation
Integration | Separate generation and parsing steps | Direct integration with application functions
Cost | Lower (no extra context) | Higher (function definitions use tokens)
Maintenance | Regular updates to parsers | Less frequent updates needed
Best Practices for Both Approaches
Schema Design
Use clear, descriptive field names that the model can understand
Include descriptions for each field to guide the model
Specify types and constraints explicitly
Keep schemas as simple as possible while meeting requirements
Prompt Engineering
Include explicit instructions about the format
Provide examples of the desired output structure
Use clear, consistent terminology between prompts and schemas
Specify validation rules in your prompts
Error Handling
Implement graceful degradation when structure fails
Build retry mechanisms with more explicit instructions
Log and analyze common failure patterns
Consider fallback to simpler schemas when necessary
Implementation
Start with function calling when available
Add validation as a safety net even with function calling
Combine approaches for critical applications
Test extensively with edge cases
Hybrid Approach Example
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import json
# 1. Define schema with Pydantic
class Person(BaseModel):
name: str
age: int = Field(gt=0, lt=150)
skills: List[str]
contact: Optional[str] = None
@validator("skills")
def validate_skills(cls, v):
if not v:
return ["general"]
return v
# 2. Set up parser
parser = PydanticOutputParser(pydantic_object=Person)
# 3. Create prompt template
template = """
Create a person profile based on the description below.
{format_instructions}
Description: {description}
"""
prompt = PromptTemplate(
template=template,
input_variables=["description"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
# 4. Setup function calling as primary approach
functions = [
{
"name": "create_person",
"description": "Create a new person record",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
# 5. Define the hybrid approach
def create_structured_person(description):
# Try function calling first
try:
llm = ChatOpenAI(temperature=0, model="gpt-4")
response = llm.invoke(
[{"role": "user", "content": description}],
functions=functions,
function_call={"name": "create_person"}
)
if hasattr(response, "function_call"):
# Extract function call data
function_args = json.loads(response.function_call.arguments)
# Validate with pydantic
return Person(**function_args)
except Exception as e:
print(f"Function calling failed: {e}")
# Fallback to output parsing approach
try:
formatted_prompt = prompt.format(description=description)
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
output = llm.invoke(formatted_prompt)
# Try to parse the output
return parser.parse(output.content)
except Exception as e:
print(f"Output parsing failed: {e}")
# Last resort: return a minimal valid object
return Person(name="Unknown", age=30, skills=["general"])
# Example usage
person = create_structured_person(
"Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
print(person)
Advanced Techniques
Type Coercion and Normalization
Automatically convert between similar types (strings to numbers, etc.)
Normalize formats for dates, phone numbers, currencies
Handle common variants of enumerated values
from pydantic import BaseModel, Field, validator
from typing import Union, List
from datetime import datetime
class Event(BaseModel):
name: str
date: Union[datetime, str]
attendees: Union[List[str], str, int]
@validator("date", pre=True)
def parse_date(cls, v):
if isinstance(v, datetime):
return v
if isinstance(v, str):
try:
# Try multiple date formats
for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%d-%m-%Y", "%B %d, %Y"]:
try:
return datetime.strptime(v, fmt)
except ValueError:
continue
# If all formats fail, use a default
return datetime.now()
except Exception:
return datetime.now()
return datetime.now()
@validator("attendees", pre=True)
def parse_attendees(cls, v):
if isinstance(v, list):
return v
if isinstance(v, str):
# Handle comma-separated string
if "," in v:
return [name.strip() for name in v.split(",")]
# Handle space-separated string
return [name.strip() for name in v.split()]
if isinstance(v, int):
# Handle just a count
return [f"Attendee {i+1}" for i in range(v)]
return []
Extraction Techniques for Embedded Structures
Handle cases where structured data is embedded in prose
Extract tables from text
Identify and parse lists within paragraphs
import re
def extract_table_from_text(text):
# Find lines that look like table rows
rows = []
current_table_lines = []
in_table = False
for line in text.split("\n"):
# Check if line has pipe separators like a markdown table
if "|" in line and not line.strip().startswith("<!--"):
if not in_table:
in_table = True
current_table_lines.append(line)
elif in_table and line.strip() == "":
# Empty line ends the table
if current_table_lines:
rows.extend(current_table_lines)
current_table_lines = []
in_table = False
# Add any remaining table lines
if current_table_lines:
rows.extend(current_table_lines)
# Parse the table rows
parsed_rows = []
for row in rows:
# Skip separator rows (----)
if re.match(r'^\s*[\-\|]+\s*$', row):
continue
# Extract cells from the row
cells = [cell.strip() for cell in row.split("|")]
# Remove empty cells at start/end (from leading/trailing |)
if cells and cells[0] == "":
cells = cells[1:]
if cells and cells[-1] == "":
cells = cells[:-1]
if cells:
parsed_rows.append(cells)
# Create structured data
if not parsed_rows:
return []
# Use first row as headers
headers = parsed_rows[0]
data = []
for row in parsed_rows[1:]:
# Ensure row has same length as headers by padding if needed
while len(row) < len(headers):
row.append("")
# Truncate if too long
row = row[:len(headers)]
# Create dict from row
data.append(dict(zip(headers, row)))
return data
Dynamic Schema Generation
Create schemas on-the-fly based on user requests
Adapt to changing requirements without code changes
Use model capabilities to generate schemas for validation
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, create_model
import json
from typing import Dict, Any, List, Union, Optional
def generate_schema_from_description(description: str) -> BaseModel:
"""Generate a Pydantic model from a natural language description."""
prompt = f"""
Create a JSON schema for: {description}
Include appropriate types (string, integer, number, boolean, array, object)
and required fields. The output should be a valid JSON schema object.
"""
llm = ChatOpenAI(temperature=0, model="gpt-4")
response = llm.invoke(prompt)
try:
# Extract JSON schema from response
schema_text = response.content
schema_match = re.search(r'\{.*\}', schema_text, re.DOTALL)
if schema_match:
schema_json = json.loads(schema_match.group(0))
else:
schema_json = json.loads(schema_text)
# Create field definitions for Pydantic model
fields = {}
annotations = {}
for field_name, field_def in schema_json.get("properties", {}).items():
field_type = field_def.get("type", "string")
description = field_def.get("description", "")
# Map JSON schema types to Python types
type_mapping = {
"string": str,
"integer": int,
"number": float,
"boolean": bool,
"array": List[Any],
"object": Dict[str, Any]
}
python_type = type_mapping.get(field_type, Any)
# Handle arrays with specific item types
if field_type == "array" and "items" in field_def:
items_type = field_def["items"].get("type", "string")
if items_type in type_mapping:
python_type = List[type_mapping[items_type]]
# Set as optional if not required
required_fields = schema_json.get("required", [])
if field_name not in required_fields:
python_type = Optional[python_type]
# Add field to model definition
annotations[field_name] = python_type
fields[field_name] = (python_type, Field(description=description))
# Create the model
model_name = "DynamicModel"
DynamicModel = create_model(model_name, **fields)
return DynamicModel
except Exception as e:
print(f"Failed to generate schema: {e}")
# Return a basic model as fallback
return create_model("FallbackModel", content=(Dict[str, Any], ...))
Conclusion
Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.
Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.
As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.
For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.
ctured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.
Overview of Techniques
Prompt Engineering Approaches
Few-shot prompting: Providing examples of the desired output format
System prompts with format instructions: Explicit instructions about structure
XML/JSON templates: Including skeleton templates for the model to fill in
Structured response schemas: Requesting specific fields in a particular order
Chain-of-thought prompting: Breaking down the output generation process step by step
Technical Approaches
Output parsers: Post-processing tools that extract structured data
Function calling / Tool use: Defining functions with parameter schemas
JSON mode: Special configurations optimized for JSON output
Guardrails/validators: Frameworks that validate outputs against schemas
Structured generation libraries: Tools like LangChain, LMQL, etc.
Fine-tuning: Training models on datasets with the desired output format
Deep Dive: Output Parsers
Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.
How Output Parsers Work
Definition Phase: Define a schema that specifies the expected structure (fields, types, constraints)
Extraction Phase: Process the LLM's text output to extract structured data
Validation Phase: Validate extracted data against the schema
Error Handling: Implement retry strategies, fallbacks, or corrections for validation failures
Types of Output Parsers
Regex-based Parsers
Use regular expressions to extract patterns from text
Simple but brittle; breaks with minor format changes
Best for consistent, simple outputs
Example use case: Extracting simple key-value pairs or specific patterns
import re
def parse_person_regex(text):
name_match = re.search(r"Name: (.*?)(\n|$)", text)
age_match = re.search(r"Age: (\d+)", text)
return {
"name": name_match.group(1) if name_match else None,
"age": int(age_match.group(1)) if age_match else None
}
Grammar-based Parsers
Define formal grammars that specify valid output formats
More robust than regex for complex structures
Examples include PEG (Parsing Expression Grammar) parsers
Useful for outputs with nested structures or complex patterns
# Using the Lark parser library
from lark import Lark, Transformer
person_grammar = """
start: person
person: "Person:" NAME "Age:" AGE "Skills:" skills
skills: SKILL ("," SKILL)*
NAME: /[a-zA-Z ]+/
AGE: /\d+/
SKILL: /[a-zA-Z]+/
%import common.WS
%ignore WS
"""
class PersonTransformer(Transformer):
def start(self, items):
return items[0]
def person(self, items):
return {"name": items[0], "age": int(items[1]), "skills": items[2]}
def skills(self, items):
return items
parser = Lark(person_grammar, start="start", transformer=PersonTransformer())
def parse_with_grammar(text):
try:
return parser.parse(text)
except Exception as e:
print(f"Parsing error: {e}")
return None
JSON/XML Parsers
Specialized for extracting structured data formats
Often include schema validation
Common in LLM frameworks like LangChain and LlamaIndex
Particularly useful when the LLM can directly generate well-formed JSON/XML
import json
from jsonschema import validate
# Define JSON schema
person_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"skills": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["name", "age", "skills"]
}
def parse_json_output(text):
try:
# Extract JSON from potential surrounding text
# This regex finds the first JSON object in the text
import re
json_match = re.search(r'\{.*\}', text, re.DOTALL)
if json_match:
json_str = json_match.group(0)
data = json.loads(json_str)
# Validate against schema
validate(instance=data, schema=person_schema)
return data
except json.JSONDecodeError:
print("Failed to parse JSON")
except Exception as e:
print(f"Validation error: {e}")
return None
Pydantic/Schema Validators
Use schema libraries to validate and coerce data
Provide clear error messages for invalid data
Allow for complex nested structures and type validation
Great for Python applications with complex data models
from pydantic import BaseModel, Field, validator
from typing import List, Optional
class Skill(BaseModel):
name: str
level: str = "beginner"
@validator("level")
def validate_level(cls, v):
valid_levels = ["beginner", "intermediate", "expert"]
if v.lower() not in valid_levels:
return "beginner"
return v.lower()
class Person(BaseModel):
name: str
age: int = Field(..., gt=0, lt=150)
skills: List[Skill]
contact: Optional[str] = None
def parse_person_data(llm_output: str) -> Person:
try:
# Basic extraction (simplified)
import json
data = json.loads(llm_output)
# Validate against schema
return Person(**data)
except Exception as e:
print(f"Parsing error: {e}")
return None
Advantages of Output Parsers
Works with any LLM, even without function calling capabilities
Can be customized for specific application needs
Provides a clear separation between generation and parsing concerns
Can be adapted to handle various formats and edge cases
Limitations of Output Parsers
Parsing can fail if LLM output doesn't match expected format
Requires careful prompt engineering to get consistent formats
Error handling and retry logic add complexity
May need regular updates as LLM behavior changes
Deep Dive: Function Calling / Tool Use
Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.
How Function Calling Works
Function Definition: Define functions with clear parameter schemas (typically in JSON Schema format)
Invocation: Prompt the LLM with these function definitions to generate compatible outputs
Execution: Use the structured output to call actual functions in your application
Implementation Approaches
Native Function Calling
Supported directly by models like GPT-4, Claude 3, and Llama 3
The model is aware of function signatures and generates compatible outputs
Example platforms: OpenAI API, Anthropic API, Anyscale Endpoints
# With OpenAI's function calling
import openai
# Define the function schema
functions = [
{
"name": "create_person",
"description": "Create a new person record",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
# API call with function definitions
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
],
functions=functions,
function_call={"name": "create_person"}
)
# Extract structured function call parameters
function_args = json.loads(response.choices[0].message.function_call.arguments)
print(function_args)
Anthropic Example (Claude 3)
from anthropic import Anthropic
client = Anthropic()
tools = [
{
"name": "create_person",
"description": "Create a new person record",
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1000,
messages=[
{"role": "user", "content": "Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."}
],
tools=tools
)
# Process tool calls from response
for tool_call in response.content:
if tool_call.type == "tool_call":
print(tool_call.name) # Function name
print(tool_call.input) # Function parameters
Structured Tool Use Frameworks
Libraries like LangChain Tools, CrewAI
Define tools with input/output schemas
Handle routing between multiple possible functions
from langchain.tools import StructuredTool
from langchain.chat_models import ChatOpenAI
from langchain.agents import AgentExecutor, create_structured_chat_agent
from pydantic import BaseModel, Field
from typing import List
# Define schema with Pydantic
class PersonCreateInput(BaseModel):
name: str = Field(description="Person's full name")
age: int = Field(description="Person's age in years")
skills: List[str] = Field(description="List of person's skills")
contact: str = Field(description="Contact information", default=None)
# Create the actual function that will be called
def create_person(name: str, age: int, skills: List[str], contact: str = None):
person = {"name": name, "age": age, "skills": skills}
if contact:
person["contact"] = contact
return f"Created person: {person}"
# Define the tool
create_person_tool = StructuredTool.from_function(
name="create_person",
description="Create a new person record",
func=create_person,
args_schema=PersonCreateInput
)
# Set up LLM and agent
llm = ChatOpenAI(temperature=0)
tools = [create_person_tool]
agent = create_structured_chat_agent(llm, tools, verbose=True)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run the agent
result = agent_executor.run(
"Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
Agent Frameworks
Orchestration systems like AutoGPT, LangGraph
Chain multiple function calls together
Allow for complex workflows with decision-making
from langgraph.graph import END, StateGraph
from langchain_core.messages import AIMessage, HumanMessage
from langchain_openai import ChatOpenAI
from typing import Dict, List, Annotated, TypedDict
import json
# Define state
class AgentState(TypedDict):
messages: Annotated[List, "The messages in the conversation"]
person_data: Annotated[Dict, "The person data being built"]
# Define tool functions
def create_person(state: AgentState, name: str, age: int, skills: List[str], contact: str = None):
person = {"name": name, "age": age, "skills": skills}
if contact:
person["contact"] = contact
state["person_data"] = person
return state
def save_person(state: AgentState):
# In a real application, would save to database
print(f"Saved person: {state['person_data']}")
return state
# Define nodes
def agent(state: AgentState) -> AgentState:
messages = state["messages"]
llm = ChatOpenAI()
response = llm.invoke(messages)
state["messages"].append(response)
return state
def router(state: AgentState):
last_message = state["messages"][-1]
if "create person" in last_message.content.lower():
return "create_person"
elif "save" in last_message.content.lower():
return "save_person"
else:
return "agent"
# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent)
workflow.add_node("create_person", create_person)
workflow.add_node("save_person", save_person)
# Add edges
workflow.add_edge("agent", router)
workflow.add_edge("create_person", "agent")
workflow.add_edge("save_person", END)
# Compile graph
app = workflow.compile()
# Run the workflow
app.invoke({
"messages": [HumanMessage(content="Create a profile for a developer named John")],
"person_data": {}
})
Advantages of Function Calling
More reliable structured outputs since the model is constrained
Cleaner integration with application code
Reduced need for complex parsing and error handling
Better validation at generation time
Acts as implicit documentation of expected parameters
Limitations of Function Calling
Not all models support function calling natively
Additional token usage for function definitions
May limit creative outputs when rigid structure is enforced
Model may still occasionally produce invalid outputs
Can be more expensive due to larger context window usage
Comparison: Output Parsers vs. Function Calling
Feature | Output Parsers | Function Calling
-- | -- | --
Model Requirements | Works with any LLM | Requires models with function calling capability
Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation)
Reliability | Medium (depends on prompt engineering) | High (built-in constraints)
Flexibility | More flexible for varied outputs | More rigid, follows schema strictly
Error Handling | Post-processing | During generation
Integration | Separate generation and parsing steps | Direct integration with application functions
Cost | Lower (no extra context) | Higher (function definitions use tokens)
Maintenance | Regular updates to parsers | Less frequent updates needed
Best Practices for Both Approaches
Schema Design
Use clear, descriptive field names that the model can understand
Include descriptions for each field to guide the model
Specify types and constraints explicitly
Keep schemas as simple as possible while meeting requirements
Prompt Engineering
Include explicit instructions about the format
Provide examples of the desired output structure
Use clear, consistent terminology between prompts and schemas
Specify validation rules in your prompts
Error Handling
Implement graceful degradation when structure fails
Build retry mechanisms with more explicit instructions
Log and analyze common failure patterns
Consider fallback to simpler schemas when necessary
Implementation
Start with function calling when available
Add validation as a safety net even with function calling
Combine approaches for critical applications
Test extensively with edge cases
Hybrid Approach Example
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, Field, validator
from typing import List, Optional
import json
# 1. Define schema with Pydantic
class Person(BaseModel):
name: str
age: int = Field(gt=0, lt=150)
skills: List[str]
contact: Optional[str] = None
@validator("skills")
def validate_skills(cls, v):
if not v:
return ["general"]
return v
# 2. Set up parser
parser = PydanticOutputParser(pydantic_object=Person)
# 3. Create prompt template
template = """
Create a person profile based on the description below.
{format_instructions}
Description: {description}
"""
prompt = PromptTemplate(
template=template,
input_variables=["description"],
partial_variables={"format_instructions": parser.get_format_instructions()}
)
# 4. Setup function calling as primary approach
functions = [
{
"name": "create_person",
"description": "Create a new person record",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Person's full name"},
"age": {"type": "integer", "description": "Person's age in years"},
"skills": {
"type": "array",
"items": {"type": "string"},
"description": "List of person's skills"
},
"contact": {"type": "string", "description": "Contact information"}
},
"required": ["name", "age", "skills"]
}
}
]
# 5. Define the hybrid approach
def create_structured_person(description):
# Try function calling first
try:
llm = ChatOpenAI(temperature=0, model="gpt-4")
response = llm.invoke(
[{"role": "user", "content": description}],
functions=functions,
function_call={"name": "create_person"}
)
if hasattr(response, "function_call"):
# Extract function call data
function_args = json.loads(response.function_call.arguments)
# Validate with pydantic
return Person(**function_args)
except Exception as e:
print(f"Function calling failed: {e}")
# Fallback to output parsing approach
try:
formatted_prompt = prompt.format(description=description)
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")
output = llm.invoke(formatted_prompt)
# Try to parse the output
return parser.parse(output.content)
except Exception as e:
print(f"Output parsing failed: {e}")
# Last resort: return a minimal valid object
return Person(name="Unknown", age=30, skills=["general"])
# Example usage
person = create_structured_person(
"Create a profile for a software developer named John who is 35 years old and knows Python and JavaScript."
)
print(person)
Advanced Techniques
Type Coercion and Normalization
Automatically convert between similar types (strings to numbers, etc.)
Normalize formats for dates, phone numbers, currencies
Handle common variants of enumerated values
from pydantic import BaseModel, Field, validator
from typing import Union, List
from datetime import datetime
class Event(BaseModel):
name: str
date: Union[datetime, str]
attendees: Union[List[str], str, int]
@validator("date", pre=True)
def parse_date(cls, v):
if isinstance(v, datetime):
return v
if isinstance(v, str):
try:
# Try multiple date formats
for fmt in ["%Y-%m-%d", "%m/%d/%Y", "%d-%m-%Y", "%B %d, %Y"]:
try:
return datetime.strptime(v, fmt)
except ValueError:
continue
# If all formats fail, use a default
return datetime.now()
except Exception:
return datetime.now()
return datetime.now()
@validator("attendees", pre=True)
def parse_attendees(cls, v):
if isinstance(v, list):
return v
if isinstance(v, str):
# Handle comma-separated string
if "," in v:
return [name.strip() for name in v.split(",")]
# Handle space-separated string
return [name.strip() for name in v.split()]
if isinstance(v, int):
# Handle just a count
return [f"Attendee {i+1}" for i in range(v)]
return []
Extraction Techniques for Embedded Structures
Handle cases where structured data is embedded in prose
Extract tables from text
Identify and parse lists within paragraphs
import re
def extract_table_from_text(text):
# Find lines that look like table rows
rows = []
current_table_lines = []
in_table = False
for line in text.split("\n"):
# Check if line has pipe separators like a markdown table
if "|" in line and not line.strip().startswith("<!--"):
if not in_table:
in_table = True
current_table_lines.append(line)
elif in_table and line.strip() == "":
# Empty line ends the table
if current_table_lines:
rows.extend(current_table_lines)
current_table_lines = []
in_table = False
# Add any remaining table lines
if current_table_lines:
rows.extend(current_table_lines)
# Parse the table rows
parsed_rows = []
for row in rows:
# Skip separator rows (----)
if re.match(r'^\s*[\-\|]+\s*$', row):
continue
# Extract cells from the row
cells = [cell.strip() for cell in row.split("|")]
# Remove empty cells at start/end (from leading/trailing |)
if cells and cells[0] == "":
cells = cells[1:]
if cells and cells[-1] == "":
cells = cells[:-1]
if cells:
parsed_rows.append(cells)
# Create structured data
if not parsed_rows:
return []
# Use first row as headers
headers = parsed_rows[0]
data = []
for row in parsed_rows[1:]:
# Ensure row has same length as headers by padding if needed
while len(row) < len(headers):
row.append("")
# Truncate if too long
row = row[:len(headers)]
# Create dict from row
data.append(dict(zip(headers, row)))
return data
Dynamic Schema Generation
Create schemas on-the-fly based on user requests
Adapt to changing requirements without code changes
Use model capabilities to generate schemas for validation
from langchain.chat_models import ChatOpenAI
from pydantic import BaseModel, create_model
import json
from typing import Dict, Any, List, Union, Optional
def generate_schema_from_description(description: str) -> BaseModel:
"""Generate a Pydantic model from a natural language description."""
prompt = f"""
Create a JSON schema for: {description}
Include appropriate types (string, integer, number, boolean, array, object)
and required fields. The output should be a valid JSON schema object.
"""
llm = ChatOpenAI(temperature=0, model="gpt-4")
response = llm.invoke(prompt)
try:
# Extract JSON schema from response
schema_text = response.content
schema_match = re.search(r'\{.*\}', schema_text, re.DOTALL)
if schema_match:
schema_json = json.loads(schema_match.group(0))
else:
schema_json = json.loads(schema_text)
# Create field definitions for Pydantic model
fields = {}
annotations = {}
for field_name, field_def in schema_json.get("properties", {}).items():
field_type = field_def.get("type", "string")
description = field_def.get("description", "")
# Map JSON schema types to Python types
type_mapping = {
"string": str,
"integer": int,
"number": float,
"boolean": bool,
"array": List[Any],
"object": Dict[str, Any]
}
python_type = type_mapping.get(field_type, Any)
# Handle arrays with specific item types
if field_type == "array" and "items" in field_def:
items_type = field_def["items"].get("type", "string")
if items_type in type_mapping:
python_type = List[type_mapping[items_type]]
# Set as optional if not required
required_fields = schema_json.get("required", [])
if field_name not in required_fields:
python_type = Optional[python_type]
# Add field to model definition
annotations[field_name] = python_type
fields[field_name] = (python_type, Field(description=description))
# Create the model
model_name = "DynamicModel"
DynamicModel = create_model(model_name, **fields)
return DynamicModel
except Exception as e:
print(f"Failed to generate schema: {e}")
# Return a basic model as fallback
return create_model("FallbackModel", content=(Dict[str, Any], ...))
Conclusion
Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.
Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.
As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.
For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.
Comprehensive Guide to Enforcing Structured Outputs from LLMs
Introduction
Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce stru
Comprehensive Guide to Enforcing Structured Outputs from LLMs
Introduction
Large Language Models (LLMs) typically generate natural language text in a free-form manner. However, many applications require structured data that can be reliably parsed and processed by downstream systems. This guide documents techniques and technologies that help enforce structured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.
Overview of Techniques
Prompt Engineering Approaches
Technical Approaches
Deep Dive: Output Parsers
Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.
How Output Parsers Work
Types of Output Parsers
Regex-based Parsers
Grammar-based Parsers
JSON/XML Parsers
Pydantic/Schema Validators
Advantages of Output Parsers
Limitations of Output Parsers
Deep Dive: Function Calling / Tool Use
Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.
How Function Calling Works
Implementation Approaches
Native Function Calling
Anthropic Example (Claude 3)
Structured Tool Use Frameworks
Agent Frameworks
Advantages of Function Calling
Limitations of Function Calling
Comparison: Output Parsers vs. Function Calling
Feature | Output Parsers | Function Calling -- | -- | -- Model Requirements | Works with any LLM | Requires models with function calling capability Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation) Reliability | Medium (depends on prompt engineering) | High (built-in constraints) Flexibility | More flexible for varied outputs | More rigid, follows schema strictly Error Handling | Post-processing | During generation Integration | Separate generation and parsing steps | Direct integration with application functions Cost | Lower (no extra context) | Higher (function definitions use tokens) Maintenance | Regular updates to parsers | Less frequent updates neededBest Practices for Both Approaches
Schema Design
Prompt Engineering
Error Handling
Implementation
Hybrid Approach Example
Advanced Techniques
Type Coercion and Normalization
Extraction Techniques for Embedded Structures
Dynamic Schema Generation
Conclusion
Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.
Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.
As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.
For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.
ctured outputs from LLMs, with an in-depth focus on output parsers and function calling approaches.Overview of Techniques
Prompt Engineering Approaches
Technical Approaches
Deep Dive: Output Parsers
Output parsers transform unstructured LLM responses into structured data formats. They act as a bridge between the flexible text generation of LLMs and the rigid data structures needed in applications.
How Output Parsers Work
Types of Output Parsers
Regex-based Parsers
Grammar-based Parsers
JSON/XML Parsers
Pydantic/Schema Validators
Advantages of Output Parsers
Limitations of Output Parsers
Deep Dive: Function Calling / Tool Use
Function calling represents a more integrated approach where the LLM is explicitly designed to output in a structured format that matches predefined function parameters.
How Function Calling Works
Implementation Approaches
Native Function Calling
Anthropic Example (Claude 3)
Structured Tool Use Frameworks
Agent Frameworks
Advantages of Function Calling
Limitations of Function Calling
Comparison: Output Parsers vs. Function Calling
Feature | Output Parsers | Function Calling -- | -- | -- Model Requirements | Works with any LLM | Requires models with function calling capability Implementation Complexity | Higher (must handle parsing errors) | Lower (structure enforced at generation) Reliability | Medium (depends on prompt engineering) | High (built-in constraints) Flexibility | More flexible for varied outputs | More rigid, follows schema strictly Error Handling | Post-processing | During generation Integration | Separate generation and parsing steps | Direct integration with application functions Cost | Lower (no extra context) | Higher (function definitions use tokens) Maintenance | Regular updates to parsers | Less frequent updates neededBest Practices for Both Approaches
Schema Design
Prompt Engineering
Error Handling
Implementation
Hybrid Approach Example
Advanced Techniques
Type Coercion and Normalization
Extraction Techniques for Embedded Structures
Dynamic Schema Generation
Conclusion
Enforcing structured outputs from LLMs is essential for building reliable applications. Output parsers and function calling represent two complementary approaches, each with strengths and weaknesses.
Output parsers offer flexibility and work with any LLM but require more error handling. Function calling provides more reliable structure but requires specific model capabilities. The best approach often combines these techniques, using function calling when available with output parsers as a fallback or validation layer.
As LLM technology evolves, we can expect more sophisticated techniques for structured outputs. Future developments may include more native structure capabilities in models, better error correction, and more intelligent schema inference.
For critical applications, a hybrid approach that leverages the strengths of multiple techniques will provide the most robust solution.