Closed sudo-jarvis closed 8 months ago
Hi @Julian , I have framed this initial PR for adding support for multiple dialects to our lexer. I wanted to get your insights before I proceed further.
Some of the pending things:
1. Figuring out how to fetch keywords for each dialect:
The function jsonschema.Draft7Validator.VALIDATORS.keys()
doesnt seem to contain all the keywords for the dialect. Eg. keywords such as $anchor
, $vocabulary
are present here: https://www.learnjsonschema.com/2020-12/ but not present in the function output. So we might need to make a list of these keywords manually.
2. Optimizing and simplifying the code further
3. Identifying when the object is a schema and when not
@Julian , please have a look
Cases handled-
Sample:
- [x] ~~Embedded schemas can have a different dialect.~~ Sample:
- [x] ~~The presence of an identifier ($id or id depending on dialect) determines that the subschema is schema resource.~~ Sample:
- [x] ~~Treat an unknown dialect as an embedded schema even though I don't know if it declares an identifier and treat all properties as non-keywords.~~ Sample: **Need to find a way to figure out this -** - [ ] Properties are only keywords inside schemas.
The function jsonschema.Draft7Validator.VALIDATORS.keys() doesnt seem to contain all the keywords for the dialect. Eg. keywords such as $anchor, $vocabulary are present here: https://www.learnjsonschema.com/2020-12/ but not present in the function output. So we might need to make a list of these keywords manually.
Yeah I forgot -- I was going to say the only ones you'll be missing are the annotation keywords, but you're right, $anchor
and $dynamicAnchor
also won't be there, nor even will then
as it's handled by if
.
As I mentioned elsewhere, someday this will live in jsonschema-specifications
, so I guess do whatever is easiest for you to make sure you've got them, whether that's hardcoding the list from scratch or else using jsonschema.SomeValidator.keys() | {"missing", "keywords"}
.
The second thing you need is what I mentioned in the Bowtie pull request. You need a way to explicitly indicate what dialect a schema with no $schema
is.
So you want a classmethod, JSONSchemaLexer.with_implicit_dialect(dialect: URI) -> JSONSchemaLexer
that takes a dialect to assume and then selects that dialect's keywords, unless of course the schema ends up containing a $schema
keyword, in which case you use that URI.
But I'd add that ASAP, once you have it, we can think about merging this on the Bowtie side because Bowtie has that info (which dialect to assume) to provide.
Oh, and since I didn't say it, nice progress already!
@Julian, Added support for specifying dialect while instantiating the JSONSchemaLexer. JSONSchemaLexer()
creates the default lexer without any dialect specified while JSONSchemaLexer(dialect_uri)
creates the lexer with the specified dialect.
Samples:
1. When nothing is specified:
**2. When no schema keyword is specified but specified dialect is `2020-12`:**
**3. When schema specifies `2019-09` but dialect explicitly specified is `2020-12`, the `$schema` keyword takes precedence:**
Very nice. Will have a look at the code but screenshots look good.
Another issue to be aware of is that not all dialects support boolean schemas, so that's another thing we could highlight correctly as an edge case.
Specifically, drafts 3 and 4 do not, so if you encounter a boolean in a place where a schema belongs we could highlight that in an error color. But let's do that kind of thing later, it will require knowledge about which keywords contain schemas I think. (Open an issue though maybe if you don't mind).
Nice! Let's merge this and see how it goes. Well done.
This PR addresess adding support for multiple dialects in the lexer.