Outputting invalid json & skipping schema attributes

jpeig commented 10 months ago

Love your project and I've been using it extensively now with vllm.

However, what I noticed is that by default, the llm-format-enforcing is not enforcing the schema (as opposed to Jsonformer - which I used before) and has a tendency of wrapping up / ending the output prematurely by skipping attributes, especially those at the end of the schema.

A sample of some output: {\n "event_body": "As you continue your journey in the Dutch Golden Age, you find yourself at a crossroads. The city\'s bustling trade and art scene have brought new opportunities for growth, but also challenges that require strategic decision-making. You must choose between three paths: deepening your knowledge of the world through \'Insight\', strengthening your leadership abilities with \'Force\', or enhancing your diplomatic skills with \'Diplomacy\'. Each path will shape not only how you navigate this era but also how others perceive and interact with you. As the clock ticks, it\'s time to make a choice that will define who you become during this remarkable period in history",\n "location": "The Golden House",\n "trigger_time_of_day": "evening",\n "option-1": {\n "player_option": "\'Insight\' - Dive deeper into understanding the world around us through intellectual exploration and wisdom",\n "internal_dialogue": "I need to expand my knowledge to better navigate this era." \n }, \n "option-2": { \n "player_option":"\'Force\' - Strengthen our physical prowess while maintaining honorable leadership qualities", \n "internal_dialogue":"I should focus on building my strength and character for effective leadership." }, \n "option-3": { \t\t\t\t\t\t \t"player_option":"\'Diplomacy\' - Develop strong interpersonal relationships by fostering harmony among people from various backgrounds", \n "internal_dialogue":"It is crucial to maintain good relationships for success in this era."}, \n }

For every "option-x" it skipped "challenges", "success_effects" and "failure_effects". At the end, the output should have finalized with an object called 'trigger_conditionals', but instead it closed the object in spite of the generated comma. All of these attributes have been defined in my schema.

My fix up to now is adding the "required" attribute to the schema, but this is quite tedious and may not be waterproof.

Furthermore, as you can see it ended the response with a comma - which is not valid JSON. The fact that it can output a response that is not valid JSON is concerning.

Any recommended approaches to tackling this problem?

Note: I am using main branch and employing the logitsprocessor of the vllm integration.

noamgat commented 10 months ago

Hi! Happy to hear that the library is helping you. 1) By default, I think objects are not mandatory in a json schema. The project aims to be as 100% json-schema compliant as possible. If you find, that in the json schema definition, an object without a required array means that all fields are required, please link me to it and I will change the parser accordingly. Otherwise, it is possible to add a field to CharacterLevelParserConfig that controls this behavior (maybe json_objects_require_all_fields_by_default: bool) and change the behavior of JsonSchemaParser to use this flag.

2) The trailing comma looks like a bug! It should be possible to submit a unit test that reproduces this. Can you send the schema that you used + the string that you received, and I will see if the parsing tree indeed accepts the trailing comma?

jpeig commented 10 months ago

Hi @noamgat

I couldnt find it in the official schema definition. My case is mostly from pragmatic reasons. If you define an object, you'd expect it to be generated - by trial and error I found that inserting the required attribute would seem to resolve my problem. As I programmatically create schema's from a templating solution, I can of course implement some custom logic that always inserts the "required" attribute for each object. It was unclear to me that you would use this "required" attribute in the parser, but it appears that you are, so that would resolve my concerns.
This is the schema template that generated the sample I shared earlier:

You can replace the {{keys}} from the templating logic with "option-1" and "option-2" respectively.

{
    "type": "object",
    "description": "This is an event for the active mission.",
    "properties": {
        "title": {"type": "string"},
        "location": {"type": "string"},
        "trigger_time_of_day": {
            "type": "string",
            "enum": ["morning", "afternoon", "evening", "night"]
        },
        "event_body": {
            "type": "string",
            "description": "keep the player event succinct."
        },
        {% for key in items %}
                "{{ key }}": {
            "type": "object",
            "properties": {
                "player_option": {"type": "string", "description": "write in the second perspective ('you') and write in the active form"},
                "challenges": {
                    "type": "object",
                    "description": "the challenges the player is required to pass - be as strict/accurate as possible",
                    "properties": {
                        "perform_payment": {
                            "type": "boolean",
                            "description": "whether or not (True/False) the player is required to pay or bribe or transfer money in order to proceed"
                        },
                        "perform_insight_check": {
                            "type": "boolean",
                            "description": "whether or not (True/False) the player should roll 'insight' in order to determine success or failure for this action"
                        },
                        "perform_force_check": {
                            "type": "boolean",
                            "description": "whether or not (True/False) the player should roll 'force' in order to determine success or failure for this action"
                        },
                        "perform_diplomacy_check": {
                            "type": "boolean",
                            "description": "whether or not (True/False) the player should roll 'diplomacy' in order to determine success or failure for this action"
                        }
                    }
                },
                "success_effects": {
                    "type": "object",
                    "properties": {
                        "event_body": {
                            "type": "string",
                            "description": "Write a subsequent event_body / narrative about what happens next, assuming they succeeded (write in second perspective, avoid gameplay jargon)"
                        },
                        "gameplay_effects": {
                            "type": "object",
                            "description": "ensure that the gameplay changes strictly align with the narrative change - just provide a True or False per gameplay attribute",
                            "properties": {
                                "item_gained": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the player gains possession over a specific item as a result of this action"
                                },
                                "direct_wealth_increase": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the player increases its direct 'wealth' as a result of this action"
                                },
                                "notoriety_increase": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'notoriety' of the player increases as a result of this action"
                                },
                                "standing_change": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'standing' of the player changes as a result of this action"
                                },
                                "character_change": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'personality' of the player changes as a result of this action"
                                }
                            }
                        }
                    }
                },
                "failure_effects": {
                    "type": "object",
                    "properties": {
                        "event_body": {
                            "type": "string",
                            "description": "Write a subsequent event_body / event narrative about what happens next, assuming they failed (write in second perspective, avoid gameplay jargon)"
                        },
                        "gameplay_effects": {
                            "type": "object",
                            "description": "Ensure that the gameplay changes strictly align with the narrative change - just provide a True or False per gameplay attribute",
                            "properties": {
                                "item_lost": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the player loses possession over a specific item as a result of this action"
                                },
                                "direct_wealth_decrease": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the player decreases its direct 'wealth' as a result of this action"
                                },
                                "notoriety_increase": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'notoriety' of the player increases as a result of this action"
                                },
                                "standing_change": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'standing' of the player changes as a result of this action"
                                },
                                "character_change": {
                                    "type": "boolean",
                                    "description": "whether or not (True/False) the 'character' of the player changes as a result of this action"
                                }
                            }
                        }
                    }
                },
                "internal_dialogue": {
                    "type": "string",
                    "description": "write a line of internal dialogue where the player ponders the consequences of the action in accordance with the communication style and character of the player"
                }
            }
        },
        {% endfor %}
        "trigger_conditionals": {
            "type": "object",
            "properties": {
                "must_trigger_now": {"type": "boolean"},
                "must_trigger_today": {"type": "boolean"},
                "must_trigger_tomorrow": {"type": "boolean"},
                "must_trigger_this_week": {"type": "boolean"},
                "must_trigger_in_morning": {"type": "boolean"},
                "must_trigger_in_afternoon": {"type": "boolean"},
                "must_trigger_in_evening": {"type": "boolean"},
                "must_trigger_at_night": {"type": "boolean"}
            }
        }
    }
}

I was using the TheBloke_MythoMist-7B-AWQ model and the following parameters:

temperature=0.0, frequency_penalty=1.0

jpeig commented 10 months ago

I ran the schema again. Now with the "required" attributes included:

{
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "location": {"type": "string"},
        "trigger_time_of_day": {
            "type": "string",
            "enum": ["morning", "afternoon", "evening", "night"]
        },
        "event_body": {
            "type": "string",
            "description": "keep the player event succinct."
        },
        {% for key in items %}
        "{{ key }}": {
            "type": "object",
            "properties": {
                "player_option": {"type": "string", "description": "write in the second perspective ('you') and write in the active form"},
                "challenges": {
                    "type": "object",
                    "description": "the challenges the player is required to pass - be as strict/accurate as possible",
                    "properties": {
                        "perform_payment": {"type": "boolean", "description": "whether or not (True/False) the player is required to pay or bribe or transfer money in order to proceed"},
                        "perform_insight_check": {"type": "boolean", "description": "whether or not (True/False) the player should roll 'insight' in order to determine success or failure for this action"},
                        "perform_force_check": {"type": "boolean", "description": "whether or not (True/False) the player should roll 'force' in order to determine success or failure for this action"},
                        "perform_diplomacy_check": {"type": "boolean", "description": "whether or not (True/False) the player should roll 'diplomacy' in order to determine success or failure for this action"}
                    },
                    "required": ["perform_payment", "perform_insight_check", "perform_force_check", "perform_diplomacy_check"]
                },
                "success_effects": {
                    "type": "object",
                    "properties": {
                        "event_body": {
                            "type": "string",
                            "description": "Write a subsequent event_body / narrative about what happens next, assuming they succeeded (write in second perspective, avoid gameplay jargon)"
                        },
                        "gameplay_effects": {
                            "type": "object",
                            "description": "ensure that the gameplay changes strictly align with the narrative change - just provide a True or False per gameplay attribute",
                            "properties": {
                                "item_gained": {"type": "boolean", "description": "whether or not (True/False) the player gains possession over a specific item as a result of this action"},
                                "direct_wealth_increase": {"type": "boolean", "description": "whether or not (True/False) the player increases its direct 'wealth' as a result of this action"},
                                "notoriety_increase": {"type": "boolean", "description": "whether or not (True/False) the 'notoriety' of the player increases as a result of this action"},
                                "standing_change": {"type": "boolean", "description": "whether or not (True/False) the 'standing' of the player changes as a result of this action"},
                                "character_change": {"type": "boolean", "description": "whether or not (True/False) the 'personality' of the player changes as a result of this action"}
                            },
                            "required": ["item_gained", "direct_wealth_increase", "notoriety_increase", "standing_change", "character_change"]
                        }
                    },
                    "required": ["event_body", "gameplay_effects"]
                },
                "failure_effects": {
                    "type": "object",
                    "properties": {
                        "event_body": {
                            "type": "string",
                            "description": "Write a subsequent event_body / event narrative about what happens next, assuming they failed (write in second perspective, avoid gameplay jargon)"
                        },
                        "gameplay_effects": {
                            "type": "object",
                            "description": "Ensure that the gameplay changes strictly align with the narrative change - just provide a True or False per gameplay attribute",
                            "properties": {
                                "item_lost": {"type": "boolean", "description": "whether or not (True/False) the player loses possession over a specific item as a result of this action"},
                                "direct_wealth_decrease": {"type": "boolean", "description": "whether or not (True/False) the player decreases its direct 'wealth' as a result of this action"},
                                "notoriety_increase": {"type": "boolean", "description": "whether or not (True/False) the 'notoriety' of the player increases as a result of this action"},
                                "standing_change": {"type": "boolean", "description": "whether or not (True/False) the 'standing' of the player changes as a result of this action"},
                                "character_change": {"type": "boolean", "description": "whether or not (True/False) the 'character' of the player changes as a result of this action"}
                            },
                            "required": ["item_lost", "direct_wealth_decrease", "notoriety_increase", "standing_change", "character_change"]
                        }
                    },
                    "required": ["event_body", "gameplay_effects"]
                },
                "internal_dialogue": {
                    "type": "string",
                    "description": "write a line of internal dialogue where the player ponders the consequences of the action in accordance with the communication style and character of the player"
                }
            },
            "required": ["player_option", "challenges", "success_effects", "failure_effects", "internal_dialogue"]
        },
        {% endfor %}
        "trigger_conditionals": {
            "type": "object",
            "properties": {
                "must_trigger_now": {"type": "boolean"},
                "must_trigger_today": {"type": "boolean"},
                "must_trigger_tomorrow": {"type": "boolean"},
                "must_trigger_this_week": {"type": "boolean"},
                "must_trigger_in_morning": {"type": "boolean"},
                "must_trigger_in_afternoon": {"type": "boolean"},
                "must_trigger_in_evening": {"type": "boolean"},
                "must_trigger_at_night": {"type": "boolean"}
            },
            "required": ["must_trigger_now", "must_trigger_today", "must_trigger_tomorrow", "must_trigger_this_week", "must_trigger_in_morning", "must_trigger_in_afternoon", "must_trigger_in_evening", "must_trigger_at_night"]
        }
    },
    "required": ["title", "location", "trigger_time_of_day", "event_body", "trigger_conditionals"]
}

This was what I got back without throwing any pydantic errors:

{\n "event_body": "As word spreads about the quality and precision of your firearms, wealthy patrons from across Europe are eagerly awaiting their custom-made pieces. However, competition is fierce in this industry, and rivals will stop at nothing to steal away potential clients or even sabotage your workmanship. Your objective is to develop innovative designs that surpass those of competitors while safeguarding trade secrets from being stolen or compromised."\n ,\n "location": "Your workshop",\n "trigger_time_of_day": "morning",\n "trigger_conditionals": {\n "must_trigger_in_morning": true,\n "must_trigger_in_afternoon": false,\n "must_trigger_in_evening": false,\n "must_trigger_at_night": false, \n \t \t\t\t"must_trigger_today": true \t\t , \t \t \t"must_trigger_tomorrow": false , \n "must_trigger_this_week" :false , \n "must_trigger_now" :false }, \n "option-1": { \n "player_option":"Investigate the recent thefts and strengthen security measures at your workshop", \n "success_effects":{ \t "gameplay_effects":{ \t"character_change":false,"item_gained":false,"notoriety_increase":true,"standing_change":true,"direct_wealth_increase":false}, \t "event_body":"You decide to investigate the recent thefts at your workshop. After a thorough search with the help of skilled locksmiths and watchmen you discover that someone had managed to breach through a hidden passageway leading into your storage room where all valuable items were kept. You immediately reinforce all entry points with stronger locks and install advanced security systems. The news spread among rival gunsmithers about your enhanced security measures which deters them from attempting any further heists."}, \t "failure_effects":{ \t "gameplay_effects":{ \t"character_change":false,"item_lost":true,"notoriety_increase":true,"standing_change" : true,"direct_wealth_decrease" : true}, \t "event_body":"Despite all precautions taken by you and your team members, thieves manage to break into your workshop once again stealing valuable items including some unfinished projects which took months of hard work. This incident tarnishes not only your reputation but also affects the morale of workers who now doubt their safety working under such circumstances."} , "internal_dialogue":"I must find out who\'s been stealing my hard-earned creations before they cause irreparable damage not just financially but also tarnish my reputation as a trusted gunsmith." , "challenges"\r \r :\r \r {"perform_force_check"\r: false\r ,\r \r "perform_payment"\r : false\r , "perform_insight_check"\r: false ,\r \r "perform_diplomacy_check"\r: false }}, \r "option-2"\r: {"player_option":"Focus on creating new designs without worrying about competitors","success_effects"\r:{ \r"gameplay_effects"\r:{ \r"character_change"\r:false,"item_gained"\r:false,"notoriety_increase"\r:true,"standing_change" : true,"direct_wealth_increase" : true}, \r"event_body":"You decide not to let competition affect you anymore instead focus on creating innovative designs that would set new standards in firearms industry. As word spread about these unique creations made by you; customers started flocking towardsyour shop seeking customized pieces tailored accordingto their needs resulting in increased sales revenue.\\n\\nHowever this success attractsa rival gunsmith who tries sabotagingyourworkshop hoping it would disrupt production causing financial losses foryou."}, \r"failure_effects"\r:{ \r"gameplay_effects"\r:{ \r"character_change"\r:false,"item_lost":true,"notoriety_increase":true,"standing_change" : true,"direct_wealth_decrease" : true}, "event_body":"Despite having an edge over competitors due todiverse rangeof products offeredbyyou;their tactics prove successful as they manage toget holdofsome crucial blueprintswhich could potentially ruin yearsworthof researchand development donebyyou.\\n\\nThis setback demoralizes both yourselfand employees workingunderpressure fearinglayoffsifbusinessdoesntpickup soon."} , "internal_dialogue":"I will focus on creating unique designs that caterto customer needs rather than worryingabout what others are doing." , "challenges"\r \r :\r \r {"perform_force_check"\r: false ,\r \r "perform_payment" : false , "perform_insight_check"\r: false ,\r \r "perform_diplomacy_check"\r: false}} , \r"option-3":{"player_option":"Hire skilled spies/detectives to keep an eyeon rivals\'movements","success_effects":{ \r"gameplay_effects"\r:{ \r"character_change"\r:false , \r"item_gained":false , \r"notoriety_increase":false , \r"standing_change":false , \r"direct_wealth_increase":false} ,\r \r\r "event_body":"Realizingthe importanceofsecuritymeasuresafterthelatestheisthatoccurredatyourworkshop;youdecideto hireprofessionalspies/detectiveswhowouldkeepaneyesontheactivitiesoffcompetitorswithintheindustry.\\n\\nTheirintelligenceandskillshelpidentifyanyfutureplansoftheopposingteamstointerferewithyourbusinessoperationswhichcouldpotentiallyleadtofurtherlossesfordayto dayoperations.\\n\\nHoweverthisdecisioncomeswithitsownfinancialimplicationsasitrequirespayinghighsalariesandallowancesforthosewhoareassignedforthespecialtaskssuchasgatherinformationaboutrivals\'movementsfromvarioussourcesincludingspiesworkingundercoveramongthem.”,\\r\\r“failureneffectss”:\\\\{

Some observations (ignoring the rambling LLM at the end):

The order of the output did not correspond to the order in my schema
"event_body" is inserted twice (at the end)
"trigger_conditionals" is still missing (even though it is required attribute)
The parser did not throw a pydantic error

noamgat commented 10 months ago

Order of output does not have to correspond to order in schema, this is a feature, not a bug. Field order is meaningless, and not forcing it gives the LLM more freedom to express itself, reducing hallucinations.
Can you please upload a "concrete" json schema + example output, without templating logic, so I can easily create a unit test from it?

jpeig commented 9 months ago

While field order is considered meaningless for structured schema output, field order is not meaningless to the LLM. E.g. some field can be better filled in after another has been filled in first.
See this issue for a concrete schema + example output

noamgat commented 5 months ago

v0.10.1 added support for strict json schema field ordering. Several json schema parsing bugs have been fixed since this issue was raised. Can you check if they still happen? Please reopen if they do.

noamgat / lm-format-enforcer

Outputting invalid json & skipping schema attributes #31