tanchongmin / strictjson

A Strict JSON Framework for LLM Outputs
MIT License
202 stars 25 forks source link

Strict JSON v5.0.0

[UPDATE]: For Agentic Framework and for latest updates to StrictJSON, do check out TaskGen (the official Agentic Framework building on StrictJSON). This will make the StrictJSON repo neater and this github will focus on using StrictJSON for LLM Output Parsing

You can also treat StrictJSON repo as the stable repo, as changes will only be updated here after testing is done at the TaskGen repo.

A Strict JSON Framework for LLM Outputs, that fixes problems that json.loads() cannot solve

Base Functionalities (see Tutorial.ipynb)

Tutorials and Community Support

How do I use this?

  1. Download package via command line pip install strictjson
  2. Import the required functions from strictjson
  3. Set up the relevant API Keys for your LLM if needed. Refer to Tutorial.ipynb for how to do it for Jupyter Notebooks.

How does it work?

Features:

1. Basic Generation

Example LLM Definition

def llm(system_prompt: str, user_prompt: str) -> str:
    ''' Here, we use OpenAI for illustration, you can change it to your own LLM '''
    # ensure your LLM imports are all within this function
    from openai import OpenAI

    # define your own LLM here
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-3.5-turbo',
        temperature = 0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

Example Usage

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                    llm = llm)

print(res)

Example Output

{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}

2. Advanced Generation

Example Usage

res = strict_json(system_prompt = 'You are a code generator, generating code to fulfil a task',
                    user_prompt = 'Given array p, output a function named func_sum to return its sum',
                    output_format = {'Elaboration': 'How you would do it',
                                     'C': 'Code',
                                    'Python': 'Code'},
                    llm = llm)

print(res)

Example Output

{'Elaboration': 'Use a loop to iterate through each element in the array and add it to a running total.',

'C': 'int func_sum(int p[], int size) {\n int sum = 0;\n for (int i = 0; i < size; i++) {\n sum += p[i];\n }\n return sum;\n}',

'Python': 'def func_sum(p):\n sum = 0\n for num in p:\n sum += num\n return sum'}

3. Type forcing output variables

LLM-based checks

Example Usage 1

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment, type: Enum["Pos", "Neg", "Other"]',
                                    'Adjectives': 'Array of adjectives, type: List[str]',
                                    'Words': 'Number of words, type: int',
                                    'In English': 'Whether sentence is in English, type: bool'},
                  llm = llm)

print(res)

Example Output 1

{'Sentiment': 'Pos', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7, 'In English': True}

Example Usage 2

res = strict_json(system_prompt = 'You are an expert at organising birthday parties',
                    user_prompt = 'Give me some information on how to organise a birthday',
                    output_format = {'Famous Quote about Age': 'quote with name, type: ensure quote contains the word age',
                                    'Lucky draw numbers': '3 numbers from 1-50, type: List[int]',
                                    'Sample venues': 'Describe two venues, type: List[Dict["Venue", "Description"]]'},
                    llm = llm)

print(res)

Example Output 2

Using LLM to check "The secret of staying young is to live honestly, eat slowly, and lie about your age. - Lucille Ball" to see if it adheres to "quote contains the word age" Requirement Met: True

{'Famous Quote about Age': 'The secret of staying young is to live honestly, eat slowly, and lie about your age. - Lucille Ball', 'Lucky draw numbers': [7, 21, 35],

'Sample venues': [{'Venue': 'Beachside Resort', 'Description': 'A beautiful resort with stunning views of the beach. Perfect for a summer birthday party.'}, {'Venue': 'Indoor Trampoline Park', 'Description': 'An exciting venue with trampolines and fun activities. Ideal for an active and energetic birthday celebration.'}]}

4. Functions

Example Usage 1 (Description only)

# basic configuration with variable names (in order of appearance in fn_description)
fn = Function(fn_description = 'Output a sentence with <obj> and <entity> in the style of <emotion>', 
                     output_format = {'output': 'sentence'},
                     llm = llm)

# Use the function
fn('ball', 'dog', 'happy') #obj, entity, emotion

Example Output 1

{'output': 'The happy dog chased the ball.'}

Example Usage 2 (Examples only)

# Construct the function: infer pattern from just examples without description (here it is multiplication)
fn = Function(fn_description = 'Map <var1> and <var2> to output based on examples', 
                     output_format = {'output': 'final answer'}, 
                     examples = [{'var1': 3, 'var2': 2, 'output': 6}, 
                                 {'var1': 5, 'var2': 3, 'output': 15}, 
                                 {'var1': 7, 'var2': 4, 'output': 28}],
                     llm = llm)

# Use the function
fn(2, 10) #var1, var2

Example Output 2

{'output': 20}

Example Usage 3 (Description and Examples)

# Construct the function: description and examples with variable names
# variable names will be referenced in order of appearance in fn_description
fn = Function(fn_description = 'Output the sum and difference of <num1> and <num2>', 
                 output_format = {'sum': 'sum of two numbers', 
                                  'difference': 'absolute difference of two numbers'},
                 examples = {'num1': 2, 'num2': 4, 'sum': 6, 'difference': 2},
                 llm = llm)

# Use the function
fn(3, 4) #num1, num2

Example Output 3

{'sum': 7, 'difference': 1}

Example Usage 4 (External Function with automatic inference of fn_description and output_format - Preferred)

# Docstring should provide all input variables, otherwise we will add it in automatically
# We will ignore shared_variables, *args and **kwargs
# No need to define llm in Function for External Functions
from typing import List
def add_number_to_list(num1: int, num_list: List[int], *args, **kwargs) -> List[int]:
    '''Adds num1 to num_list'''
    num_list.append(num1)
    return num_list

fn = Function(external_fn = add_number_to_list)

# Show the processed function docstring
print(str(fn))

# Use the function
fn(3, [2, 4, 5])

Example Output 5

Description: Adds <num1: int> to <num_list: list>

Input: ['num1', 'num_list']

Output: {'num_list': 'Array of numbers'}

{'num_list': [2, 4, 5, 3]}

Example Usage 5 (External Function with manually defined fn_description and output_format - Legacy Approach)

def binary_to_decimal(x):
    return int(str(x), 2)

# an external function with a single output variable, with an expressive variable description
fn = Function(fn_description = 'Convert input <x: a binary number in base 2> to base 10', 
            output_format = {'output1': 'x in base 10'},
            external_fn = binary_to_decimal)

# Use the function
fn(10) #x

Example Output 4

{'output1': 2}

5. Integrating with OpenAI JSON Mode

Example Usage

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                    model = 'gpt-3.5-turbo-1106' # Set the model
                    openai_json_mode = True) # Toggle this to True

print(res)

Example Output

{'Sentiment': 'positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 6}

6. Nested Outputs

Example Input

res = strict_json(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': ['Type of Sentiment', 
                                                   'Strength of Sentiment, type: Enum[1, 2, 3, 4, 5]'],
                                    'Adjectives': "Name and Description as separate keys, type: List[Dict['Name', 'Description']]",
                                    'Words': {
                                        'Number of words': 'Word count', 
                                        'Language': {
                                              'English': 'Whether it is English, type: bool',
                                              'Chinese': 'Whether it is Chinese, type: bool'
                                                  },
                                        'Proper Words': 'Whether the words are proper in the native language, type: bool'
                                        }
                                    },
                 llm = llm)

print(res)

Example Output

{'Sentiment': ['Positive', 3],

'Adjectives': [{'Name': 'beautiful', 'Description': 'pleasing to the senses'}, {'Name': 'sunny', 'Description': 'filled with sunshine'}],

'Words':

{'Number of words': 6,

'Language': {'English': True, 'Chinese': False},

'Proper Words': True}

}

7. Return as JSON

8. Async Mode

Example LLM in Async Mode

async def llm_async(system_prompt: str, user_prompt: str):
    ''' Here, we use OpenAI for illustration, you can change it to your own LLM '''
    # ensure your LLM imports are all within this function
    from openai import AsyncOpenAI

    # define your own LLM here
    client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model='gpt-3.5-turbo',
        temperature = 0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

Example Input (strict_json_async)

res = await strict_json_async(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful and sunny day',
                    output_format = {'Sentiment': 'Type of Sentiment',
                                    'Adjectives': 'Array of adjectives',
                                    'Words': 'Number of words'},
                                     llm = llm_async) # set this to your own LLM

print(res)

Example Output

{'Sentiment': 'Positive', 'Adjectives': ['beautiful', 'sunny'], 'Words': 7}

Example Input (AsyncFunction)

fn =  AsyncFunction(fn_description = 'Output a sentence with <obj> and <entity> in the style of <emotion>', 
                     output_format = {'output': 'sentence'})

res = await fn('ball', 'dog', 'happy') #obj, entity, emotion

print(res)

Example Output

{'output': 'The dog happily chased the ball.'}