Open awwright opened 1 year ago
Would you include YAML as a non-JSON input?
Yes, YAML also has a larger value space than JSON. For example, it supports circular references (Anchors and Aliases)—there's no way to encode this to JSON and then back to YAML. However, for a certain subset of YAML, you can just convert it to JSON, then validate that. (Or some equivalent calculation, if you want to optimize away the "conversion to JSON" step.)
Maybe this can be addressed: "Non-JSON formats may be validated if there is a single correct representation as JSON. Values without a JSON representation will either be indistinguishable, or cause an error." Maybe that's enough guidance?
I think that's why we state that we operate on the JSON data model. I believe there's already text that says JSON Schema can operate in any format that maps into that data model.
Well, that's the paragraph I'm proposing to remove, at least from core. (Again that wouldn't suggest you can't pass alternate serializations to a validator, just that it's out of scope to describe in core.)
Related to this, I was thinking that "data model" could be simplified too. The data model is something I introduced to address the fact that the same value in JSON can be represented in multiple different ways. But the section is largely a paraphrase of the instance equality section, it may be easier just to say "the data model distinguishes JSON documents by those that are not instance equal."
And then after this, we can re-examine how non-JSON formats fit into this, maybe by specifying how a non-JSON document can be compared for instance equality to a JSON document.
Like I mentioned above, this issue may be a good place to consolidate "Instance Data Model" and "Instance Equality" into a single section. Each section is describing essentially the same concept just in different terms.
this is an interesting fact to point out. However, this [...] is somewhat outside the scope of JSON Schema, and so should be removed.
I agree. In fact, I think this kind of thing happens a lot in the spec and it would be nice to clean some of these things up.
@awwright What, precisely, are you proposing be removed, that whole statement, or just the bit at the end about CBOR?
Action is to remove the phrase highlighting CBOR. I think the rest is pertinent.
The extent to which JSON Schema can be used to validate data structured as a non-JSON input isn't defined well enough. The spec currently says
In my personal opinion, this is an interesting fact to point out. However, this isn't enough guidance to ensure that different implementations are compatible. Additionally, is somewhat outside the scope of JSON Schema, and so should be removed.
If this should be written into the standard, it should go into more detail about how this works technically. For any JSON-compatible format, there should be an isomorphism to JSON, or there should be guidance on how to handle the larger value space (for example, CBOR provides data tags, which applications might like to distinguish).
But I think the best option is to remove this for now, and publish guidance on handling non-JSON inputs separately.
Closes #1274