json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.82k stars 266 forks source link

contentSchema has implementation-defined referencing behavior when contentMediaType is not present #1381

Open Julian opened 1 year ago

Julian commented 1 year ago

The text of 2020-12's contentSchema says:

The value of this property MUST be a valid JSON schema. It SHOULD be ignored if "contentMediaType" is not present.

In #1288 which I've found, there was some back and forth about what the first sentence there means (which I agree with -- i.e. that the current spec seems to treat this schema as any other subschema, by virtue of omission). But the second sentence still has undefined behavior, because "SHOULD" there isn't clear.

Specifically, this schema (or others like it) when validating the instance 12:

{
  "$ref": "#foo",
  "contentSchema": {
      "$anchor": "foo",
      "type": "string"
  }
}

either blows up if one follows the SHOULD or else returns invalid if one ignores the SHOULD about contentMediaType being missing. (FWIW of the implementations Bowtie supports, there appear to be at least 1 implementation with both behaviors, though unclear whether they explicitly intend to do so).

It seems unnecessary to have undefined behavior in this case (one reminiscent of old $ref), so my personal opinion would be to remove that second sentence (and not intimate that it's ok to ignore it), or to just make it invalid to have contentSchema present without contentMediaType (and add a dependentRequired to the metaschema).

(n.b. #1352 already tweaked the language here coming out of #1288, but the second sentence is still present.)

gregsdennis commented 1 year ago

blows up if one follows the SHOULD

I don't follow your logic here. Following the SHOULD means that contentSchema is ignored in your example. I see nothing wrong with that.

It doesn't mean that identifiers aren't still registered, just that the keyword isn't evaluated.

Julian commented 1 year ago

Here's the language from draft 7's $ref definition:

All other properties in a "$ref" object MUST be ignored.

where it was essentially universally understood that this means identifiers are not registered. As far as a quick skim, no other language is present there, that's how we chose to communicate that behavior.

So I think you're asserting something (that there's only one interpretation for the current language) which isn't consistent with historical wording.

But at worst even if you're right there I'd still think it's unclear rather than really undefined.

jdesrosiers commented 1 year ago

I think the intention was that that quote is in the context of using the contentSchema keyword as an assertion (which is optional, off-by-default behavior). It doesn't mean that the contentSchema sub-schema should be ignored, it means that it shouldn't apply contentSchema as an assertion on the parsed string data. I think it's a "SHOULD" in order to allow for but discourage heuristic-based parsing of the string value.

So, I think it would be wrong for your example to produce an error. The text is certainly ambiguous and should be clarified in the next release, but I weight intention pretty high when there is ambiguity in the text and I'm pretty confident that was the intention.

Julian commented 1 year ago

Fair enough maybe it's not needed for us to argue or agree on how to interpret the current draft (I suspect we probably all agree it's quite unlikely anyone has implemented anything related to this at all, let alone anything depending on the small nonsense point I'm raising) so perhaps consider this just a "it seems like this line needs tweaking for the next version" ticket.

gregsdennis commented 1 month ago

@Julian @jdesrosiers is there an action here? Do we need to clarify anything in the current text?

Julian commented 1 month ago

If it were me my recommendation was to remove that second sentence given that we no longer allow contentSchema to be used as an assertion, and/or to make it invalid to have contentSchema without contentMediaType in the text and metaschema, but it's fairly minor given I think nearly no one does anything with these keywords so if others don't agree I'm fine if you close.

gregsdennis commented 1 month ago

We did receive a question about this in Slack in the past week, so I'll look at how it can be made more explicit, but I don't think any requirements need to change.

jdesrosiers commented 3 weeks ago

I think whether or not this needs action depends on the resolution of #1288.

According to Henry, the example given wouldn't be valid. The schema in contentSchema is not a subschema. It's an annotation that happens to be in the shape of a schema. So, the $ref in that example is invalid because it's referencing a location that is known to not be a schema. That's true regardless of whether contentMediaType is present.

However, if it's decided that contentSchema is a subschema of the schema it appears in, then this becomes an issue we should probably address in the next release.

gregsdennis commented 3 weeks ago

I agree that it's worth revisiting that conversation. Primarily, I'd like to identify (good) use cases for considering it a subschema.

I can't think of any good reasons someone would want to $ref into that subschema (i.e. creating an $id in that subschema that is then referenced from somewhere else), however it might be handy to be able to $ref out of it, e.g. into $defs.

If it's not a subschema, then even something as simple as

{
  // ...
  "contentSchema": { "$ref": "#/$defs/my-content" },
  "$defs": {
    "my-content": { /* ... */ }
  }
}

wouldn't be possible.

jdesrosiers commented 3 weeks ago

There are other considerations as well. For instance, contentSchema not being a subschema violates the principle of least surprise. People will assume this is a normal subschema and be confused when they find it's not, which will lead to a lot of questions we have to answer. Another consideration is ease of implementation. It will vary given different architectures, but I know one option would be problematic for me to implement while the other would require nothing at all.