Open handrews opened 6 years ago
First, I'm not sure I understand your example. There are a lot of properties and values in there that you don't explain:
relevantTypes
(okay, this one's pretty self-explanatory)applicator
and its propertiesannotation
(implies that this keyword counts as an annotation?)in-place
vs local
remote
vs child
subschemas
vs adjacentOnly
I've been away from this for about 6 months working on other projects, and coming into this fresh I am confused.
I also think you're getting dangerously close to implementing (or at least defining) logic in JSON here. It's important to note that while $keyword
may describe a new user-defined keyword's relationship to other keywords, it doesn't define the meaning behind the keyword.
For example, a $keyword
entry for minimum
could define that it applies to numeric JSON values only, but it can't possibly define minimum
's validation logic. That has to be coded into the implementation.
What is an implementation to do when it doesn't know how to validate a keyword? It can know that the keyword is there and its relationship with other keywords, but it can't validate instances that use the keyword. Is that of any use?
I'd like to show a complete example, but as I mentioned before, I am befuddled by yours, so I'll fudge it where I don't understand. Feel free to correct.
Let's try to define the if/then/else
paradigm of draft 7 but in draft 6 (and assuming the existence of your $keyword
stuff).
Given a new meta-schema of my own user-level design:
{
"$id":"http://my.awesome.schema/draft-01/schema#",
"$schema":"http://json-schema.org/draft-06/schema#",
"allOf":[{"$ref":"http://json-schema.org/draft-06/schema#"}],
"properties":{
"if":{
"allOf":[{"$ref":"#"}],
"$keyword":{
"annotation":true,
"relevantTypes":["object"]
}
},
"then":{
"allOf":[{"$ref":"#"}],
"$keyword":{
"annotation":true,
"relevantTypes":["object"],
"dependsOn":{
"annotations":{
"if": ???
}
}
}
},
"else":{
"allOf":[{"$ref":"#"}],
"$keyword":{
"annotation":true,
"relevantTypes":["object"],
"dependsOn":{
"annotations":{
"if": ???
}
}
}
},
}
}
My awesome draft-6-with-custom-keyword-compatible schema validator reads this in and recognizes that
if
, then
, and else
are all user-defined keywords,then
and else
are dependent upon if
... somehow.There is no way to tell the implementation that
then
and else
are to be ignored when if
is absent,if
passes, apply then
and ignore else
, andif
fails, apply else
and ignore then
.These are things that the implementation must understand in a way that cannot (and should not) be encoded within the JSON. And because we're leaving out the behavior of these keywords, we can't be certain that any implementation will handle them correctly. Each implementation will need to be independently instructed on how to handle these new keywords.
Even if we were to bake evaluation rules into the "if"
dependency declarations:
{
...
"else":{
"allOf":[{"$ref":"#"}],
"$keyword":{
"annotation":true,
"relevantTypes":["object"],
"dependsOn":{
"annotations":{
"if": {
"applyOn":"failure"
}
}
}
}
...
}
should we? I seem to recall you vehemontly arguing against logic in JSON in the past.
Note: I'm not sure if the allOf
is necessary in my keyword definitions. Can $ref
be adjacent to $keyword
?
Great questions @gregsdennis! I'll have to come back to this tomorrow for a full response, but for a few key points:
unevaluatedProperties
needs to run after all in-place applicators, so we need to be able to express that requirement, and keywords also need to indicate whether they are in-place applicators so an implementation knows the correct order for evaluating them.$keyword
more or less mean what they say- all of those terms are used either in this issue, or in PRS #595 and/or #600. I haven't sorted out exactly how they should work, at this point this is just to give a general impression of how things might look.I've been thinking on this more and we may be able to get a way with a less complicated $keyword
or even none at all.
Really, the issue is that in-place applicators need to go before other kinds of keywords, as in-place applicators are the only keywords that can gather more annotations from this same point in the instance. I can't come up with other keyword classification-based orderings right now.
That doesn't solve per-keyword ordering (additional*
and if
/then
/else
), but we've gotten by OK with those handled as special cases so far, so maybe we can keep doing that?
I'd like to avoid overcomplicating the first draft with vocabulary support.
@handrews I was mulling over this, and I think the main hesitation that I have is the idea of the keyword meta-data being defined in the meta-schema. That just seems "bloaty" to me, making the meta-schema difficult to read. Perhaps if this information were to be defined in a separate document, I'd be more open to it.
Perhaps something like this:
{
"$id":"http://my.awesome.schema/draft-01/schema#",
"$keywords":"http://my.awesome.schema/draft-01/keywords#",
...
}
Then the keywords document would contain the pertinent information relating to each of the properties (stealing from your example above), but no validation stuff:
{
"if":{
"annotation":true,
"relevantTypes":["object"]
},
"then":{
"annotation":true,
"relevantTypes":["object"],
"dependsOn":{
"annotations":{
"if": ???
}
},
"else":{
"annotation":true,
"relevantTypes":["object"],
"dependsOn":{
"annotations":{
"if": ???
}
}
},
...
}
Of course, I'd want to put in a $schema
and $id
, but this isn't a schema, it's data that amends the meta-schema.
@handrews BTW, I also support extending schemas. It's not documented, though. I should probably do that.
In the linked test, I extend the draft-06 schema to add if/then/else keywords.
@gregsdennis a separate document could be a reasonable approach for the $vocabularies
keyword (and then that document would have $keywords
or something in it).
Right now in #561 the proposal is that the $vocabularies
URIs identify meta-schemas that just describe that vocabulary. But I've been uncertain if that's the right approach. Particularly because it's not entirely clear how they should be combined (allOf
vs anyOf
, for example), which leads to the overall meta-schema having to specify them as both $vocabularies
and within an allOf
or similar.
I don't think we need an elaborate keywords object for draft-08, as noted in my last comment, but these are interesting ideas and might point us in a different direction for #561.
I think an interesting test case for this feature is how easy it would be to define negative applicator keywords:
nand
= { "not" : { "allOf" : [ ... ] } }
nor
= { "not" : { "anyOf" : [ ... ] } }
xnor
= { "not" : { "oneOf" : [ ... ] } }
Now this doesn't necessarily cover custom keywords, where the actual logic is defined in a specification (like the maximum
keyword), but it's interesting nonetheless.
BTW, extending schemas is now easier with my new implementation. The client just has to write an implementation for a keyword and register it. Presto!
I'm now starting to think that defining a simple vocabulary description format would be beneficial for draft-08. It would do three things:
format
, content*
etc. (#563)This would actually make it slightly easier to do these as annotations in a meta-schema (because you need to recognize in-place applicators to have any hope of making that work), but I still firmly believe that vocabulary term lists and meta-schemas serve distinct purposes. I am only likely to change my view if someone else shows a fully working counter-proposal.
At this point, I am pretty certain that we will wait to attempt any sort of formal description of application semantics in a vocabulary file of some sort until at least draft-09, so we can get feedback on the basics of vocabularies first. So I'm bumping it to the next milestone.
(apologies for earlier comment with the wrong number- deleted and re-added to send email again)
Since
unevaluatedProperties
#556 andunevaluatedItems
#557 depend on the results of other keywords, not just in the immediate schema object but in subschemas, we need to decide how extension keywords can or cannot impact that behavior. There are two cases:New object child applicators or array child applicators
New in-place applicators
For child applicators, as noted in https://github.com/json-schema-org/json-schema-spec/issues/530#issuecomment-392608099, we should not allow them to change the behavior of
unevaluated*
. This follows fromadditionalProperties
andadditionalItems
which do not change as a result of new keywords. They are defined just in terms ofproperties
/patternProperties
oritems
.For new in-place applicators, which could contain
*properties
or*items
keywords, the situation is more complex.TL;DR:
*properties
and*items
to affectunevaluated*
even when they are in subschemas of an extension in-place applicator.Example
In our brave new world of multi-vocabulary schemas, let's pretend someone decides to create an extension keyword
patternSchemaDependencies
which is a cross betweenpatternProperties
andschemaDependencies
(the old schema form ofdependencies
). So, if the instance is an object, and at least one property matches a pattern inpatternSchemaDependencies
, then that pattern's subschema is applied to the current instance location, making it an in-place applicator.Consider this schema using that keyword (and assume
patternSchemaDependencies
is properly declared in the meta-schema referenced by$schema
, and in whatever vocabulary stuff we come up with, and that the implementation will only process the schema if it understands the extension vocabulary, etc. see #561 for details)Should
{"foooo": 1, "bar": "hello"}
be valid or invalid?My intuition says that it should be valid.
unevaluatedProperties
applies to properties that have never had a subschema fromproperties
,patternProperties
,additionalProperties
, or anotherunevaluatedProperties
applied to them.In this example, because of
patternSchemaDependencies
, the "bar" property is covered by the schema at#/patternSchemaDependencies/^foo/properties/bar
.The problem
The reason this might not work is that we (presumably) did not know about
patternSchemaDependencies
when we wrote the spec forunevaluatedProperties
. So the implementation might not know that it could affect the behavior ofunevaluatedProperties
.If it happens to check
patternSchemaDependencies
first, this won't matter- as explained in https://github.com/json-schema-org/json-schema-spec/issues/530#issuecomment-392608099, theproperties
keyword in its subschema would put the property name "bar" in the "properties" annotation, andunevaluatedProperties
would notice it and exclude it from its applicable set.However, if the implementation happens to check
unevaluatedProperties
before it checkspatternSchemaDependencies
, then the annotation results for "properties" at that point will not include "bar" (or anything else, in this example). SounevaluatedProperties
will apply it'sfalse
subschema to "bar", which will fail validation, andpatternSchemaDependencies
will never even be checked.So not only would it seem counter-intuitive (to me, at least) for this to fail, it's actually non-deterministic. It depends entirely on the keyword evaluation order, which is not constrained by the spec.
Implementation burden
How could an implementation possibly know that it needs to check
patternSchemaDependencies
beforeunevaluatedProperties
? Of course, if the implementation only supports a fixed set of known vocabularies, the schema author could hardwirepatternSchemaDependencies
and any other known in-place applicators as being checked beforeunevaluatedProperties
.That is totally acceptable for fixed-vocabulary implementations, and I expect many will go this route.
However, it breaks down if someone wants to make a generically extensible implementation where 3rd-parties can register handlers for new vocabularies and keywords at runtime. This is not a hypothetical situation; Ajv's custom keyword support does exactly this already.
Of course, an extensible implementation's interface could provide a way to pass in such information when registering the keyword. However, leaving this interface to individual implementations to design will lead to variable quality and ease of use levels, increasing the barrier to adoption of extensions.
For that matter, needing to figure out the registration design is a significant task that probably discourages making implementations extensible in the first place.
A solution
Fortunately there's nothing magical about
patternSchemaDependencies
, specifically. All in-place applicators will have this effect, whether they are keywords likeallOf
that we know about now, or keywords of this sort added in the future by 3rd parties.Generally a keyword should either affect things based on its classification (in this example, all present and future in-place applicators, regardless of specific behavior, are involved), or based on the specific keyword itself (in which case, as with
additionalProperties
depending onproperties
andpatternProperties
, the relevant keywords are enumerated in the specification).With #561 vocabulary support, we now have a way to indicate that we are defining schema keywords. We can tag these keyword definitions with various properties in the meta-schema. The structure of these tags would provide a standard interface for writing extensible implementations.
Presumably, most implementations would be passed the relevant meta-schemas as part of their extension loading sequence, and retrieved by recognizing the vocabulary URI at runtime (similar to how most implementations pre-package the standard meta-schemas rather than dynamically resolving them from somewhere).
We could add a keyword description object (KDO), and a keyword called
keyword
or$keyword
that takes that object as a value. I'm suggesting an object, similar tolinks
with it's array of LDOs, as the information in the KDO will probably be processed very differently from other keywords. I could also see using the prefixed compound word form, but this does feel distinct enough for an object.Solution example
It could look something like this (off the top of my head without much thought to the syntax, so while we can discuss syntax as part of the overall solution, complaints about minor details will be ignored for now- syntax is always solvable).
This example shows the declaration of an in-place applicator (
allOf
), plus a child applicator that depends only on specific keywords (additionalProperties
) and one that depends on both specific properties and on a whole class of keywords (unevaluatedProperties
).The specific keyword dependencies are notated in terms of the annotations produced by that keyword, which is how such dependencies are now described in the specification. Annotation values are read either from adjacent keywords only, or from subschemas in addition to adjacent keywords.
Note that when
relevantTypes
is absent, the keyword applies to all possible instance types.To explain the
schemaLocation
part, note that$ref
(which would not be in the same vocabulary) would have{"instanceLocation": "in-place", "schemaLocation": "remote"}
. Since the classification dependency forunevaluatedProperties
only mentionsinstanceLocation
, that means that it matches regardless of the value ofschemaLocation
. This is very hand-wavy and I have not thought through all implications. I am sure that there will be a way to work it out.There's obviously a lot more that could be done in this area, and we need to figure out what is so essential that it needs to be in draft-08, and what can be deferred. But I think that this mechanism is a key part of enabling schema designers to write their own vocabulary, and have a viable chance of that vocabulary becoming interoperable across multiple implementations.
Should
[$]keyword
be part of core?Should the keyword description object be part of core or its own vocabulary? I'd say that this will be determined by whether we consider extensible implementations a fundamental part of the JSON Schema system. If they are, then we need
$keyword
to bootstrap the system. If they are not, then we can make this a separate thing that only extensible implementations need to support.