Closed handrews closed 2 years ago
I'm against this change as it is written; I've found it useful in OpenAPI validation. I understand that in some situations it might be a security risk, so it's reasonable to have its support be optional (as it already is), and to default to NOT performing validation. I've been thinking of splitting it into two vocabularies: Annotation and Validation. Like with format
, it can be enabled when desired, and left disabled most of the time.
To clarify: I'm fully onboard with changing the spec to clarify that an author of a schema should never be unsure whether content*
will be validated or not -- implementations should NOT validate by default, so that leaves either something in $vocabulary to indicate that validation is desired, or an out-of-band configuration variable that defaults to false (which is what I've done in my implementation, so it can be used in certain situations where appropriate decoding algorithms have been provided to the implementation).
@karenetheridge this is not about content*
having validation behavior (which it does not). This is about the complexity of shoving an entire second (but totally independent) validation process into an interface that doesn't allow it.
You could still create a validating content vocabulary after this change.
@karenetheridge also, the "automatic feed-back-to-validate" process can trivially be implemented as a library on top of any JSON Schema implementation that supports annotation collection and any standard output beyond "flag". This PR is about taking something that is better handled as a separate library out of the spec. It just doesn't need to be here, and there is no other reason to add this much complexity to the basic interface of JSON Schema.
@karenetheridge since the content*
keywords in the https://json-schema.org/draft/next/content
vocabulary are strictly annotations, there are two ways to validate string contents while remaining in compliance with the spec:
content*
annotations are attached
contentEncoding
(the spec gives no guidance on handling any errors)contentType
(the spec gives no guidance on handling any errors)contentSchema
as a new evaluation processcontent*
annotations are present in this new outputcontent-assertion
extension vocabulary analogous the the format-assertion
vocabulary, which would perform the encoding/parsing/additional validation as part of the regular (single) evaluation processOption 1 can be done in two ways: Shoved into the main JSON Schema implementation, creating a completely different set of output requirements from everything else (as currently suggested by the spec). Or it can be done as a separate library, without burdening the JSON Schema implementation's output interface that works for literally everything else in JSON Schema.
There is no functional difference whatsoever between doing it within the implementation or as a separate library. Please let me know if you do see a difference, because I would like to understand that. I am proposing this change because fundamentally, it's not a change. It just moves non-essential functionality to a library, which could even have its own little specification to ensure interoperability of this area. That would be an improvement over the current situation.
Option 2 is not impacted by this change at all. Supporting a contentSchema
assertion-applicator is tricky as it is not clear how to represent instance locations within a string, so for now any such feature would not be interoperable, but we could figure that out if we wanted to make it interoperable.
I don't understand your references to output requirements in Option 1. What does output have to do with anything here?
The issue I have with moving this functionality out of the spec is, if we only talk about annotations in the spec, we lose the information needed to implement these keywords as validations consistently.
But, more importantly, if we "..take the output of the completed initial evaluation..", then we can't use the results of evaluating the content* keyword(s) and use them as inputs to applicator keywords one level up in the document. So I see Option 2 as much more preferrable.
it is not clear how to represent instance locations within a string.. but we could figure that out if we wanted to make it interoperable
The string is decoded (using the specified media type) into something that matches the JSON document model, so I presumed (when implementing this) that the instance location just carries on using json pointers inside the decoded data structure. That's under-specified though, so it would be good to clarify that (assuming the use of validation stays in the spec at all).
But, more importantly, if we "..take the output of the completed initial evaluation..", then we can't use the results of evaluating the content* keyword(s) and use them as inputs to applicator keywords one level up in the document
The spec has never allowed this since contentSchema
was introduced in 2019-09. It's expressly forbidden:
Implementations MAY offer the ability to decode, parse, and/or validate the string contents automatically. However, it MUST NOT perform these operations by default, and MUST provide the validation result of each string-encoded document separately from the enclosing document.
You're currently not allowed to use that result within the validation of the enclosing document. It MUST be produced as separate output results, which puts content*
assertion behavior into a totally unique processed divorced from normal implementation.
This PR removes that uniqueness. If anything, it makes what you want to do easier by removing this constraint.
So I see Option 2 as much more preferrable.
That's great! This PR actually makes that easier by removing the part that more-or-less forbids that.
I don't hear any actual objection from you regarding what this PR does. If anything, this sounds like support. You're saying "no, don't merge this", but also "my preferred solution is the thing that is easier if we merge this."
so I presumed (when implementing this) that the instance location just carries on using json pointers inside the decoded data structure. That's under-specified though, so it would be good to clarify that (assuming the use of validation stays in the spec at all).
You should file an issue for that, as it is not relevant to this PR. It was underspecified before, and it will be the same level of under-specified afterwards.
It was underspecified before, and it will be the same level of under-specified afterwards.
Well no, not really - if you take out the descriptions of how the content* keywords would validate, then that leaves the reader to make up their own (inconsistent) rules. ..But on looking again, I see you're not removing those paragraphs.
I don't hear any actual objection from you regarding what this PR does. If anything, this sounds like support. You're saying "no, don't merge this", but also "my preferred solution is the thing that is easier if we merge this."
I guess? I don't see any path forward to allowing in-place evaluation as assertions, other than creating a new vocabulary (akin to the format-annotation and format-validation vocabulary split in 2020-12), but I'd been pondering that myself and it seems reasonable.
So I guess my vote is "not-no" - I'm not terribly happy with the spec staying in the state after merging this, but the parts you are removing are parts that I definitely disagreed with. If that makes sense :)
@karenetheridge
the parts you are removing are parts that I definitely disagreed with. If that makes sense :)
Yes, exactly! This PR is about removing a bad solution to the problem you want to solve. It takes no position on other possible solutions, and you are quite welcome to file proposals for other solutions. I would actually suggest a slightly different approach to validation, using more specific keywords. A major problem with these a validation keywords is that's they're so open-ended. But if you specifically want to validate JSON-in-JSON, or or parse HTML-in JSON, etc., you can write a more clear spec for that behavior.
But that's a discussion for another issue (or discussion).
I'll go ahead and merge this, then. Two "yes"es and a "not-no" seem sufficient :-)
This feature added substantial complexity and security exposure for essentially no benefit. The functionality could be trivially implemented as a library on top of a JSON Schema implementation that supports annotation collection.
Fixes #1287. Intentionally avoids topics still under discussion in #1288.