Incorrect schemaless JSON mode

idealover commented 6 months ago

I have been trying to run schemaless JSON mode with vllm and it seems there is an issue with that integration. Here is the relevant code to replicate the issue.

from lmformatenforcer import JsonSchemaParser

# Create a parser object
parser = JsonSchemaParser(None)

allowed_chars = parser.get_allowed_characters()

for char in allowed_chars:
    print(repr(char))

Here is the output:

'{'
'0'
'f'
'4'
'\t'
'2'
'\n'
'\r'
'3'
'8'
' '
'6'
't'
'5'
'9'
'.'
'n'
'1'
'['
'-'
'7'
'"'

Clearly numbers should not be allowed to be the beginning of JSON objects.

idealover commented 6 months ago

I got confused between json_object and json. I will utilise the json object schema for my usecase.

noamgat commented 6 months ago

I believe the schema {"type":"object"} is what you are looking for, rather than no schema.

noamgat / lm-format-enforcer

Incorrect schemaless JSON mode #55