outlines-dev / outlines

Structured Text Generation
https://outlines-dev.github.io/outlines/
Apache License 2.0
7.27k stars 374 forks source link

Recursive JSON Schemas #330

Open sahewat opened 8 months ago

sahewat commented 8 months ago

Recursive Pydantic definitions seem unsupported for lists, unions, and optionals. My understanding is these are the basic use cases.

A reproducible example is provided below:

import json

from typing import List, Optional, Union
from pydantic import BaseModel

from outlines.text.json_schema import build_regex_from_schema

class TaskOptional(BaseModel):
    subtask: Optional['TaskOptional']

class TaskWrapperOptional(BaseModel):
    task: TaskOptional

class TaskUnion(BaseModel):
    subtask: Union['TaskUnion', None]

class TaskWrapperUnion(BaseModel):
    task: TaskUnion

class TaskList(BaseModel):
    subtask: List['TaskList']

class TaskWrapperList(BaseModel):
    task: TaskList

TaskWrapperOptional.model_rebuild()
TaskWrapperUnion.model_rebuild()
TaskWrapperList.model_rebuild()

regex = build_regex_from_schema(json.dumps(TaskWrapperOptional.model_json_schema()))
# NotImplementedError: 
# @ outlines/text/json_schema.py:308 

regex = build_regex_from_schema(json.dumps(TaskWrapperUnion.model_json_schema()))
# NotImplementedError: 
# @ outlines/text/json_schema.py:308 

regex = build_regex_from_schema(json.dumps(TaskWrapperList.model_json_schema()))
# RecursionError: maximum recursion depth exceeded while calling a Python object
# @ outlines/text/json_schema.py:174 

I'd be interested in adding this functionality but I'm unsure as to what an "unrolled" recursive definition would look like in terms of the generated regex.

rlouf commented 8 months ago

Thank you for opening an issue! It looks like something we should be able to support. Do you mind pasting the JSON schema for these 3 models here?

sahewat commented 8 months ago

In all cases, the schema is available through TaskWrapperxxx.model_json_schema(). I'll post an example of each here as well.

TaskWrapperOptional

{
    "$defs": {
        "Task": {
            "properties": {
                "name": {
                    "title": "Name",
                    "type": "string"
                },
                "subtasks": {
                    "default": [],
                    "items": {
                        "$ref": "#/$defs/Task"
                    },
                    "title": "Subtasks",
                    "type": "array"
                }
            },
            "required": [
                "name"
            ],
            "title": "Task",
            "type": "object"
        },
        "TaskOptional": {
            "properties": {
                "subtask": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/Task"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskOptional",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskOptional"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperOptional",
    "type": "object"
}

TaskWrapperUnion

{
    "$defs": {
        "TaskUnion": {
            "properties": {
                "subtask": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/TaskUnion"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskUnion",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskUnion"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperUnion",
    "type": "object"
}

TaskWrapperList

{
    "$defs": {
        "TaskList": {
            "properties": {
                "subtask": {
                    "items": {
                        "$ref": "#/$defs/TaskList"
                    },
                    "title": "Subtask",
                    "type": "array"
                }
            },
            "required": [
                "subtask"
            ],
            "title": "TaskList",
            "type": "object"
        }
    },
    "properties": {
        "task": {
            "$ref": "#/$defs/TaskList"
        }
    },
    "required": [
        "task"
    ],
    "title": "TaskWrapperList",
    "type": "object"
}
brandonwillard commented 2 months ago

Anyone who is interested in this feature should know that CFG-structured generation is required to truly support it.