noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
1.01k stars 46 forks source link

Incorrect schemaless JSON mode #55

Closed idealover closed 6 months ago

idealover commented 6 months ago

I have been trying to run schemaless JSON mode with vllm and it seems there is an issue with that integration. Here is the relevant code to replicate the issue.

from lmformatenforcer import JsonSchemaParser

# Create a parser object
parser = JsonSchemaParser(None)

allowed_chars = parser.get_allowed_characters()

for char in allowed_chars:
    print(repr(char))

Here is the output:

'{'
'0'
'f'
'4'
'\t'
'2'
'\n'
'\r'
'3'
'8'
' '
'6'
't'
'5'
'9'
'.'
'n'
'1'
'['
'-'
'7'
'"'

Clearly numbers should not be allowed to be the beginning of JSON objects.

idealover commented 6 months ago

I got confused between json_object and json. I will utilise the json object schema for my usecase.

noamgat commented 6 months ago

I believe the schema {"type":"object"} is what you are looking for, rather than no schema.