noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
994 stars 45 forks source link

Array of ENUMs Fails to Produce >1 Value #102

Closed NJordan72 closed 1 month ago

NJordan72 commented 1 month ago

If we define a JSON Schema with an array whose values are enumerated as below, the generator never allows more than one item to end up in the resulting output.

parser = JsonSchemaParser(
    {
        "properties": {
            "array_of_numbers": {
                "default": [],
                "description": "An array of numbers",
                "items": {
                    "type": "integer",
                    "enum": [1, 2, 3, 4, 5],
                },
                "title": "Items",
                "type": "array",
            }
        },
        "required": ["array_of_numbers"],
        "type": "object",
    },
    config=CharacterLevelParserConfig(max_consecutive_whitespaces=1),
)

As we iterate through available characters we see:

Allowable: [
Allowable: 41352]
Allowable: ]

Where the final output was {"array_of_numbers":[4]}. After the first value is chosen the allowable characters should be ,]

NJordan72 commented 1 month ago

Here is a test that fails...

def test_arrays_with_multiple_enums():
    schema = {
        "properties": {
            "array_of_numbers": {
                "default": [],
                "description": "An array of numbers",
                "items": {
                    "type": "integer",
                    "enum": [1, 2, 3, 4, 5],
                },
                "title": "Items",
                "type": "array",
                "maxItems": 2
            }
        },
        "required": ["array_of_numbers"],
        "type": "object",
    }

    _test_json_schema_parsing_with_string('{"array_of_numbers":[4]}', schema, True)
    _test_json_schema_parsing_with_string('{"array_of_numbers":[4, 1]}', schema, True)
    _test_json_schema_parsing_with_string('{"array_of_numbers":[4, 4]}', schema, True)
    _test_json_schema_parsing_with_string('{"array_of_numbers":[1, 2, 3]}', schema, False)
    _test_json_schema_parsing_with_string('{"array_of_numbers":[6]}', schema, False)
    _test_json_schema_parsing_with_string('{"array_of_numbers":[1, 6]}', schema, False)

If I remove the not is_on_top check here, it works as expected and no other tests break, but I'm not exactly sure what if that is a safe change to make or not.

The crux of the problem seems to be that num_items is not being incremented, and it is because of that check.

noamgat commented 1 month ago

Should be fixed in v0.10.2, please reopen if the issue persists.