python-jsonschema / jsonschema-lexer

A Pygments lexer for JSON Schema
MIT License
0 stars 1 forks source link

Add support for multiple dialects #4

Closed sudo-jarvis closed 8 months ago

sudo-jarvis commented 8 months ago

This PR addresess adding support for multiple dialects in the lexer.

sudo-jarvis commented 8 months ago

Hi @Julian , I have framed this initial PR for adding support for multiple dialects to our lexer. I wanted to get your insights before I proceed further.

Some of the pending things:

1. Figuring out how to fetch keywords for each dialect: The function jsonschema.Draft7Validator.VALIDATORS.keys() doesnt seem to contain all the keywords for the dialect. Eg. keywords such as $anchor, $vocabulary are present here: https://www.learnjsonschema.com/2020-12/ but not present in the function output. So we might need to make a list of these keywords manually.

2. Optimizing and simplifying the code further

3. Identifying when the object is a schema and when not

sudo-jarvis commented 8 months ago

@Julian , please have a look

Cases handled-

Sample:

Screenshot 2024-03-01 at 2 40 41 PM Screenshot 2024-03-01 at 2 41 02 PM

- [x] ~~Embedded schemas can have a different dialect.~~ Sample: Screenshot 2024-03-01 at 5 34 20 PM

- [x] ~~The presence of an identifier ($id or id depending on dialect) determines that the subschema is schema resource.~~ Sample: Screenshot 2024-03-01 at 5 37 01 PM Screenshot 2024-03-01 at 5 38 33 PM Screenshot 2024-03-01 at 5 41 40 PM

- [x] ~~Treat an unknown dialect as an embedded schema even though I don't know if it declares an identifier and treat all properties as non-keywords.~~ Sample: Screenshot 2024-03-01 at 5 42 19 PM **Need to find a way to figure out this -** - [ ] Properties are only keywords inside schemas.

Julian commented 8 months ago

The function jsonschema.Draft7Validator.VALIDATORS.keys() doesnt seem to contain all the keywords for the dialect. Eg. keywords such as $anchor, $vocabulary are present here: https://www.learnjsonschema.com/2020-12/ but not present in the function output. So we might need to make a list of these keywords manually.

Yeah I forgot -- I was going to say the only ones you'll be missing are the annotation keywords, but you're right, $anchor and $dynamicAnchor also won't be there, nor even will then as it's handled by if.

As I mentioned elsewhere, someday this will live in jsonschema-specifications, so I guess do whatever is easiest for you to make sure you've got them, whether that's hardcoding the list from scratch or else using jsonschema.SomeValidator.keys() | {"missing", "keywords"}.

The second thing you need is what I mentioned in the Bowtie pull request. You need a way to explicitly indicate what dialect a schema with no $schema is.

So you want a classmethod, JSONSchemaLexer.with_implicit_dialect(dialect: URI) -> JSONSchemaLexer that takes a dialect to assume and then selects that dialect's keywords, unless of course the schema ends up containing a $schema keyword, in which case you use that URI.

But I'd add that ASAP, once you have it, we can think about merging this on the Bowtie side because Bowtie has that info (which dialect to assume) to provide.

Julian commented 8 months ago

Oh, and since I didn't say it, nice progress already!

sudo-jarvis commented 8 months ago

@Julian, Added support for specifying dialect while instantiating the JSONSchemaLexer. JSONSchemaLexer()creates the default lexer without any dialect specified while JSONSchemaLexer(dialect_uri) creates the lexer with the specified dialect.

Samples:

1. When nothing is specified:

Screenshot 2024-03-02 at 6 52 55 PM

**2. When no schema keyword is specified but specified dialect is `2020-12`:** Screenshot 2024-03-02 at 6 49 20 PM

**3. When schema specifies `2019-09` but dialect explicitly specified is `2020-12`, the `$schema` keyword takes precedence:** Screenshot 2024-03-02 at 6 49 50 PM

Julian commented 8 months ago

Very nice. Will have a look at the code but screenshots look good.

Another issue to be aware of is that not all dialects support boolean schemas, so that's another thing we could highlight correctly as an edge case.

Specifically, drafts 3 and 4 do not, so if you encounter a boolean in a place where a schema belongs we could highlight that in an error color. But let's do that kind of thing later, it will require knowledge about which keywords contain schemas I think. (Open an issue though maybe if you don't mind).

Julian commented 8 months ago

Nice! Let's merge this and see how it goes. Well done.