Closed handrews closed 1 year ago
Note that while I definitely have an opinion on this, if there's a clear majority in favor of giving contentSchema
location behavior then I'll go with that. I filed this to have a more focused discussion, not to fight to the death on it.
I'm not too sure the distinction between this and #1288, they both seem primarily concerned with treating contentSchema
's value as data vs subschema. I commented there, what I said there seems relevant both there and here. I'll reiterate bits that are particularly about contentSchema
as a subschema other schemas can $ref to, or as a its own entity that may separately become a schema.
I think it is good to treat contentSchema
purely as schema-shaped data, not at all a subschema, in the context of the schema or resource containing it. Treating contentSchema
as a subschema, it doesn't do much - no validation, no inplace application, it doesn't describe its instance (until that instance is parsed to something else). The only thing it does is what @handrews wants to prevent, $ref
ing into it by anchor or by pointer, if it is a subschema without an $id.
As just data, it should become a schema only considered as its own detached document, with the rules that apply to root schemas. This is a bit different than resource subschemas (subschemas with an $id).
contentSchema
value to the schema or resource containing the contentSchema
will not work.Some further thoughts on describing this in the metaschema if it is not a thing $ref/$anchor interact with: I mentioned on the other issue that contentSchema
s value being described as a schema by the metaschema would be a problem, at least for my implementation, as I use that to determine what is a schema to collect $anchor from and what to consider valid to $ref into. (though, if "location keyword" becomes a concept exclusive to $defs, that might be different)
The metaschema could almost use the content*
mechanism to describe contentSchema
, if we disregard for a moment that content*
only apply to strings. The metaphor is almost the same: contentSchema
describes instance data, but is not to be applied to the data in situ, only once some processing has been done. This also describes contentSchema
's own data - the schema describing it is the metaschema, but is not to be applied in situ, only once it has been detached, given its retrieval URI, treated as a root schema.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://json-schema.org/draft/2020-12/meta/content",
"$vocabulary": {"https://json-schema.org/draft/2020-12/vocab/content": true},
"$dynamicAnchor": "meta",
"title": "Content vocabulary meta-schema",
"type": ["object", "boolean"],
"properties": {
"contentEncoding": { "type": "string" },
"contentMediaType": { "type": "string" },
- "contentSchema": { "$dynamicRef": "#meta" }
+ "contentSchema": {
+ "contentMediaType": "application/schema+json",
+
+ // encoding is the wrong description but I think the right metaphor; this is how
+ // the content (schema-shaped json data) is within the instance (a schema).
+ "contentEncoding": "json",
+
+ "contentSchema": {
+ "$id": "content/contentSchema",
+ // ref is not dynamic - contentSchema is independent and has new, empty dynamic scope
+ "$ref": "/draft/2020-12/schema",
+ }
+ }
}
}
Initially this seemed to me an interesting diversion to think about but one without practical use - a slight stretching of the metaphor, unusable because content*
do just apply to strings and I am not advocating a change to that. But then @handrews said
An alternative would be to change contentSchema from taking a schema to taking a string containing a schema. Which I almost did when I added it to ensure it was treated as data. If anyone likes this idea please speak up, but I am not expecting it to be popular.
If contentSchema
were a string, the above would basically be working. It lets the metaschema describe the instance without indicating that it is a subschema. It makes it clear to readers and authors that the value is not like a subschema. I have some negative feeling toward putting json data as a string inside other json data, but I think it does really fit better as a string. And the $ref to the metaschema is not dynamic, though I'm not sure that is more of a problem than any other way to describe contentSchema
in the metaschema would be.
@notEthan thanks for replying and copying over that text.
However, lacking an
$id
has the problem @handrews noted that its retrieval URI is the same as its parent's, if present.
That's not quite correct: It has a different retrieval URI (differing by the fragment), but since using a URI as a base URI disregards the fragment, that means that they end up with the same base URI. Proper resolution relative to that base URI would take the context schema into account, but if the context schema is not accessible (e.g. an API request evaluated an instance, got the annotations, and sent them back without providing access to the context schema), the application would not be able to resolve relative references into the context schema.
Because the annotation data (at least as of the next release which will strengthen the annotation output requirements) MUST include the schema location with a JSON Pointer fragment, references within the extracted contentSchema
schema could theoretically be resolved correctly, although that would require doing additional work rather than just handing it off to the usual URI reference resolution code.
though, if "location keyword" becomes a concept exclusive to
$defs
, that might be different
Yes, that's what #1306 is about, although it's not that it would be exclusive to $defs
, it's that $defs
is the only current keyword that has that behavior without being an applicator. So implementing #1306 would allow us to declare contentSchema
to have that behavior in addition to its annotation behavior. This would support @jdesrosiers 's position that it should be a normal schema without requiring special keyword-specific handling. Which is why I'm open to that outcome — while it's not my preference, if it can be described without keyword-specific hacks, I'll be OK with it.
I believe that we should accept #1306 as it is important for reasons beyond contentSchema
, which is why it's a separate issue from this one, and why we could accept it and not necessarily give contentSchema
schema location behavior.
I've been considering this off and on for a while, including the comments from @notEthan .
I have come to the reluctant conclusion that it is better to give "contentSchema"
location behavior, as @jdesrosiers prefers (although not expressed in those words).
Using a "contentSchema"
that lacks an "$id"
and/or a "$schema"
based on an annotation that gives its location as something like "https://example.com/schema#/properties/whatever/contentSchema"
is no different than starting validation from an identical schema at something like "https://example.com/schema#/$defs/whatever"
. In both cases, you must:
The only difference between the "contentSchema"
and "$defs"
case is that the "contentSchema"
schema is extracted as an annotation value, and removed from its context. There are a number of other possible solutions for that which can be discussed in #1288, including simply not extracting the schema as an annotation value and requiring that the user have access to the original schema (which would save memory as well).
At this point, I am convinced by @jdesrosiers 's assertions that treating "contentSchema"
"like a schema" is less confusing than treating it differently, which means giving "contentSchema"
location behavior. If #1306 is accepted, then we can do that explicitly with those words, but it is characterized as a schema already so technically we do not have to change anything.
@notEthan you're welcome to object to the PR or continue to raise arguments here, btw. I should have left those comments up for a while before making a PR, but was having a bit of an off day and going on auto-pilot.
For the purpose of this issue (and consistent with #1306), "schema location behavior" means that a keyword indicates that some part(s) of its value are schemas and MUST be recognized as such by an implementation. Being "recognized" means that an implementation knows to scan it for
$id
,$anchor
, etc. and associates the IRIs they create with the schema, along with the JSON Pointer fragment IRI (it's irrelevant whether any of this is done on load or at runtime).$defs
only has schema location behavior2020-12 classifies
$defs
as a location keyword, but the concept of "location keyword" is somewhat muddled. Schemas located through this behavior can be targeted by$ref
(or anything else that might reference schemas with an IRI). I think we generally agree that$ref
-ing into applicators is a bad practice, but we don't (currently) forbid it. TBH, I wouldn't mind forbidding it, but I suspect I'm in the minority.contentSchema
is defined as an annotation, and was not intended to have schema location behavior.$ref
-ing intocontentSchema
is definitely at least as bad a practice as$ref
-ing into an inline applicator.My preference would be to forbid it by saying that
contentSchema
lacks this behavior. Framing it in terms of schema location behavior would make this part of the JSON Schema system rather than a weird exception.As noted in #1288, @jdesrosiers would prefer that
contentSchema
have schema location behavior.I'd like to get more opinions on this point, which doesn't change the outcome of #1288. So it's not necessary to read through all of that issue.
An alternative would be to change
contentSchema
from taking a schema to taking a string containing a schema. Which I almost did when I added it to ensure it was treated as data. If anyone likes this idea please speak up, but I am not expecting it to be popular.