Closed hudson-ai closed 5 days ago
Did you check that we get sensible grammars for enum: ["foo", "bar", "baz"]
and enum: [1, 2, 3]
and also const: 1
and const: "foo"
? in particular the regex for 1 should be /1/
I think we can live with select("foo", "bar", "baz")
instead of lexeme("foo|bar|baz")
for enum for now but would be good to fix at some point (compiler can detect all branches that are regexes and combine them)
Did you check that we get sensible grammars for
enum: ["foo", "bar", "baz"]
andenum: [1, 2, 3]
and alsoconst: 1
andconst: "foo"
? in particular the regex for 1 should be/1/
I think we can live with
select("foo", "bar", "baz")
instead oflexeme("foo|bar|baz")
for enum for now but would be good to fix at some point (compiler can detect all branches that are regexes and combine them)
Currently just calling select for enums, so we get this for enum: ["foo", "bar", "baz"]
(1,2,3 looks the same except they aren't treated as json_string lexemes)
[{'Join': {'sequence': [4],
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Lexeme': {'rx': 'foo',
'contextual': None,
'temperature': None,
'json_string': True,
'json_allowed_escapes': None,
'json_raw': None,
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Lexeme': {'rx': 'bar',
'contextual': None,
'temperature': None,
'json_string': True,
'json_allowed_escapes': None,
'json_raw': None,
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Lexeme': {'rx': 'baz',
'contextual': None,
'temperature': None,
'json_string': True,
'json_allowed_escapes': None,
'json_raw': None,
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Select': {'among': [1, 2, 3],
'max_tokens': None,
'name': None,
'capture_name': None}}]
And consts will look the same but without the select. E.g. const: 1
[{'Join': {'sequence': [1],
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Lexeme': {'rx': '(1)',
'contextual': None,
'temperature': None,
'json_string': False,
'json_allowed_escapes': None,
'json_raw': None,
'max_tokens': None,
'name': None,
'capture_name': None}}]
For more complex types, e.g. {"const": {"key": "value"}}, we are getting a full grammar out of it, even though we could in principle rewrite it as a regex...
[{'Join': {'sequence': [9],
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Lexeme': {'rx': 'value',
'contextual': None,
'temperature': None,
'json_string': True,
'json_allowed_escapes': None,
'json_raw': None,
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': '"key"',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': ': ',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': '',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Join': {'sequence': [2, 3, 1],
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': '{',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': ', ',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'String': {'literal': '}',
'max_tokens': None,
'name': None,
'capture_name': None}},
{'Join': {'sequence': [6, 5, 8],
'max_tokens': None,
'name': None,
'capture_name': None}}]
wonderful! good to merge!
I don't think there's much point rewriting general consts as regexes. The one I worry about is a very large enum with just strings in it (I think it's relatively common) - it's easy to do regex and it will avoid parser limits.
I don't think there's much point rewriting general consts as regexes. The one I worry about is a very large enum with just strings in it (I think it's relatively common) - it's easy to do regex and it will avoid parser limits.
Totally. And I like the idea of detecting this kind of situation in the builder itself.
you mean GrammarBuilder
? I think it's not always valid (though it would be valid for JSON)
Re-writes consts as other types, e.g. arrays with specific prefixItems, ints with minimum==maximum, etc. Enums are then written as anyOf(consts).
This lets us use our internal validation logic rather than relying on jsonschema.
Note: I also fixed the grammar-builder's handling of UnsatisfiableSchemaError in anyOf -- it now appropriately propagates only when there are no valid schemas.