noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.42k stars 65 forks source link

Array type incorectly restricts tokens #112

Closed DrChristophFH closed 2 months ago

DrChristophFH commented 3 months ago

It seems as if the model is still allowed to produce "," tokens at the beginning of array types like so:

schema

"instructions": {
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "if": {
        "type": "string"
      },
      "query": {
        "type": "string"
      }
    }
  }
}

generated

"instructions": [
,{
"query": "..."
}]

which is not legal json.

noamgat commented 3 months ago

Can you make sure you are using the latest version of the lib? Which integration is this through?

noamgat commented 2 months ago

Closing the issue due to the problem not reproducing on latest version. Please reopen if you see the issue persist.

Code used to reproduce:

def test_bug_report_leading_comma():
    schema = {
        "properties": {
            "instructions": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                    "if": {
                        "type": "string"
                    },
                    "query": {
                        "type": "string"
                    }
                    }
                }
            }
        },
        "type": "object"
    }
    completion = """{"instructions": [
    ,{
    "query": "..."
    }]}"""
    _test_json_schema_parsing_with_string(completion, schema, False)