Closed handrews closed 6 years ago
@handrews - I think that it's very important to specify this.
As a user, it's a major advantage to be able to use JSON schema with many tools for many purposes. I can use it for a documentation generator on my website, a code generator and publish it in 3rd party directories which might have their own UIs. If the behavior of, for example description
is not specified, then the tools may behave differently, and I can no longer effectively guarantee the format of my descriptions in generated documentation.
This isn't an arbitrary problem.
In early versions of Swagger 2 the default
keyword was left open to interpretation. As a result, default
means example value in Smartdocs (which was used for documentation generation on a website) but default value if no input is provided in some code generators. (This is now the official meaning in Swagger.)
The result is users in India complained that generated code automatically did a flight search from Boston to London if the user provided no input, simply because of this default value for the documentation - which is obviously not ideal behavior!
@tadhgpearson I think I wasn't clear. I think specifying makes sense, I'm just not convinced it should be specified in the validation spec. There is a larger discussion going on about whether the whole "Meta-data keywords" section belongs in the validation spec at all. That won't change in draft-07 but it is being kicked around, in part for exactly this reason: the community wants these things specified more, but such specification isn't really about data validation.
By moving it to a separate (but still "official", whatever that means) specification, we can address the real concerns around these keywords without limiting validation behavior. All of your examples are about generating docs, code or UI. That's what the new specification(s) would focus on.
None of your examples have anything to do with validation, unless I'm missing something.
What we've come to realize is that validation and data definition are not the same thing, and a lot of the difficulty various tools struggle with has to do with finding a way to support concepts that are really only useful for validation. Or dealing with people complaining that they are not supported, even when the use cases for data definition are unclear.
@tadhgpearson
That's not really related to $use
though. That's more an issue of default
being ambiguously specified, resulting in schema authors using it to mean several different things. If default
is properly specified and / or split into multiple keywords, then the problem goes away.
The same comment applies to other metadata keywords, although I think default
is the most severe example.
@erayd Correct, absolutely not related to $use
.
To clarify - my example above was related to the default
in Swagger 2 input parameters, and was not related to JSON-schemas default
. I was just using it as a way to highlight the issues that occur when field usage is left without a clear specification.
@erayd yup. We toyed with the multiple keywords approach for default
, but it bogged down. I think that was another side effect of trying to specify meta-data keywords like validation keywords.
For data definition, I think the specifications end up being a bit looser. We need to avoid things like accidentally using examples as runtime defaults (examples
obviously helps with that). But when combining schemas with meta-data keywords, there may be a number of reasonable strategies that different applications might choose. We can then recommend a range of options as long as they don't actively clash with each other. Or specify things like default
MAY be used as a runtime default inserted into data, but such usage MUST be controlled by user configuration. Or something like that- I just made that up, don't take it too seriously.
@handrews
Mmm, there's a whole big discussion to be had there I think. Ambiguity around default
is the issue that motivated me to get involved with the spec in the first place. Draft-09 perhaps?
@erayd it will probably go over to json-schema-org/json-schema-vocabularies. Which is currently set up for three vocabularies for (docs, UI, code) generation. I'm starting to think that there is actually a basic data definition spec to work out, which may work for some and form the basis for others (obviously annotating whether an enum should be a drop-down or a set of radio buttons is UI-specific).
That would also let it evolve independent of the existing schedule. I expect it would be driven by a slightly different set of participants, just as we have people who look primarily at hyper-schema, and people who only care about validation.
@erayd @tadhgpearson we actually already have an initial proposal from someone, which you can see here: https://github.com/json-schema-org/json-schema-vocabularies/pull/4
there's some discussion already, and we'll pick it up more after draft-07 (but feel free to comment)
@handrews I like the sound of that.
At the end of the vote-a-rama, I said that I would consolidate these issues to focus the discussion for draft-08. I've filed #515 which outlines the two possible competing approaches. It's pretty abstract, but once we choose an approach, deciding which exact keyword and behavior we want should be less controversial. Therefore I'm closing this in favor of #515.
Use Case and Motivation
This is one of several proposals to come out of the analysis of the "ban additional properties mode" vs
$merge
/$patch
debate and the various use cases that fed into that.One common use case identified was specifying information about a type's usage at the point of use rather than as part of the initial type definition. In some cases, it is reasonable to provide some generic usage information in the type definition which serves as default usage documentation, but needs to be overridden in some uses with more specific information.
Additionally,
default
cannot meaningfully appear in multiple branches of logical keywords such asallOf
, as there is no sensible way to choose which value is correct for the specific usage. Conflicting validations in anallOf
simply fail theallOf
, but conflictingdefault
values are just unusable, independent of validation.Prior experience
This particular use case has been a major pain point on both large-scale JSON Schema projects with which I have been involved.
One implemented
$merge
, which was very effective but (as several people have noted, and as I now agree) it is too powerful as it (and$patch
) can literally convert any schema into an arbitrarily different schema, or even into a non-schema JSON value. In this project,$merge
was used to override default values fordefault
,title
,description
, andreadOnly
, or disambiguate among several possible values for those keywords.The other project ended up redefining many types that should be sharing a definition. In some cases, nearly every schema has had to redefine a particular sub-schema just to change the title and description.
Preserving independent validation
In order to preserve the self-contained nature of validation, usage overriding only applies to the keywords in the "Metadata keywords" section of the validation specification (
title
,description
, anddefault
) plus usage-oriented hyper-schema keywords (readOnly
andpathStart
).Breaking self-contained validation has been the main objection to other proposals in this area, either through overly powerful keywords or changing validation modes in ways which are not reflected in the schema itself. This is the only proposal to date that solves this problem with no impact on validation whatsoever.
Existing workarounds
It is possible to use an
allOf
with a reference to the type schema and a schema that includes only the point-of-use annotations. This is tolerable when reading the schema, but is not good for documentation generation. Each schema within anallOf
should be self-contained, which separates the documentation from the thing it's documenting. Additionally, if default annotation fields have been filled out, a documentation tool that attempts to extract and combine annotation data from allallOf
branches will just produce conflicting documentation.Finally,
allOf
is nonsensical when combining multiple schemas usingdefault
,readOnly
, orpathStart
, as there is no reasonable way to choose which value to use.The proposal:
$use
The keyword
$use
would be an annotation/hyperschema keyword indicating that a type (via a$ref
) is being used with additional annotation and/or hypermedia for this specific use.$use
is an object with two required properties and one optional property; the format for the required properties is borrowed from the$merge
and$patch
proposals:source
MUST be a schema, and will nearly always be a$ref
as there is no need for$use
if you have the literal schema defined in line. Required.with
should be applied to the resolved source as anapplication/merge-patch+json
instance, which MUST NOT affect any JSON Schema-defined keyword that is not explicitly allowed by the proposal. Required.$use
to their extensions or not as they choose. Required.JSON Merge Patch considerations
Declaring the
with
object to follow JSON Merge Patch semantics allows implementations to use existing libraries, and frees us up from needing to debate yet another schema combination format.However, a notable limitation of JSON Merge Patch is that you cannot overwrite something to
null
, as specifying anull
value instead deletes the key entirely. A workaround for this that is consistent with existing JSON Schema behavior is to definenull
somewhere (perhaps right in the$use
object since it does not forbid additional properties) and then$ref
it. We would need to declare that$use
is processed before$ref
. Which is fine because$ref
often needs lazy evaluation due to circular references anyway.Elsewhere, there's been an assertion that
$ref
must always resolve to a schema. This would require breaking that assumption. Since the assumption was not present in Draft 04 (in which JSON Reference was an entirely separate specification), I think this isn't too unreasonable. A compromise could be that it can reference either a schema or null.If $ref-ing null doesn't gain acceptance, another option would be an additional keyword in the
$use
object listing JSON pointers to properties that should be set to null."$" in name
I chose
$use
rather thanuse
on the grounds that keywords that manipulate the schema itself should stand out (I will be filing a more comprehensive issue on the use of$
later as it comes up in yet another not-yet-filed proposal, plus there is an open issue to consider$id
in place ofid
).Example
This shows that the type can be concisely defined in a shared location, with a clear usage object applied to it.
Meta-schema considerations
This could be added to the meta-schema by moving most of the keyword definitions into the meta-schema's
definitions
section grouped by type of keyword, and definingwith
as a schema minus the forbidden keywords.