json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.76k stars 265 forks source link

Overriding annotation/usage keywords at point of use #98

Closed handrews closed 6 years ago

handrews commented 8 years ago

Use Case and Motivation

This is one of several proposals to come out of the analysis of the "ban additional properties mode" vs $merge/$patch debate and the various use cases that fed into that.

One common use case identified was specifying information about a type's usage at the point of use rather than as part of the initial type definition. In some cases, it is reasonable to provide some generic usage information in the type definition which serves as default usage documentation, but needs to be overridden in some uses with more specific information.

Additionally, default cannot meaningfully appear in multiple branches of logical keywords such as allOf, as there is no sensible way to choose which value is correct for the specific usage. Conflicting validations in an allOf simply fail the allOf, but conflicting default values are just unusable, independent of validation.

Prior experience

This particular use case has been a major pain point on both large-scale JSON Schema projects with which I have been involved.

One implemented $merge, which was very effective but (as several people have noted, and as I now agree) it is too powerful as it (and $patch) can literally convert any schema into an arbitrarily different schema, or even into a non-schema JSON value. In this project, $merge was used to override default values for default, title, description, and readOnly, or disambiguate among several possible values for those keywords.

The other project ended up redefining many types that should be sharing a definition. In some cases, nearly every schema has had to redefine a particular sub-schema just to change the title and description.

Preserving independent validation

In order to preserve the self-contained nature of validation, usage overriding only applies to the keywords in the "Metadata keywords" section of the validation specification (title, description, and default) plus usage-oriented hyper-schema keywords (readOnly and pathStart).

Breaking self-contained validation has been the main objection to other proposals in this area, either through overly powerful keywords or changing validation modes in ways which are not reflected in the schema itself. This is the only proposal to date that solves this problem with no impact on validation whatsoever.

Existing workarounds

It is possible to use an allOf with a reference to the type schema and a schema that includes only the point-of-use annotations. This is tolerable when reading the schema, but is not good for documentation generation. Each schema within an allOf should be self-contained, which separates the documentation from the thing it's documenting. Additionally, if default annotation fields have been filled out, a documentation tool that attempts to extract and combine annotation data from all allOf branches will just produce conflicting documentation.

Finally, allOf is nonsensical when combining multiple schemas using default, readOnly, or pathStart, as there is no reasonable way to choose which value to use.

The proposal: $use

The keyword $use would be an annotation/hyperschema keyword indicating that a type (via a $ref) is being used with additional annotation and/or hypermedia for this specific use.

$use is an object with two required properties and one optional property; the format for the required properties is borrowed from the $merge and $patch proposals:

Declaring the with object to follow JSON Merge Patch semantics allows implementations to use existing libraries, and frees us up from needing to debate yet another schema combination format.

However, a notable limitation of JSON Merge Patch is that you cannot overwrite something to null, as specifying a null value instead deletes the key entirely. A workaround for this that is consistent with existing JSON Schema behavior is to define null somewhere (perhaps right in the $use object since it does not forbid additional properties) and then $ref it. We would need to declare that $use is processed before $ref. Which is fine because $ref often needs lazy evaluation due to circular references anyway.

Elsewhere, there's been an assertion that $ref must always resolve to a schema. This would require breaking that assumption. Since the assumption was not present in Draft 04 (in which JSON Reference was an entirely separate specification), I think this isn't too unreasonable. A compromise could be that it can reference either a schema or null.

If $ref-ing null doesn't gain acceptance, another option would be an additional keyword in the $use object listing JSON pointers to properties that should be set to null.

"$" in name

I chose $use rather than use on the grounds that keywords that manipulate the schema itself should stand out (I will be filing a more comprehensive issue on the use of $ later as it comes up in yet another not-yet-filed proposal, plus there is an open issue to consider $id in place of id).

Example

{
    "type": "object"
    "properties:" {
        "interestTimestamp": {
            "$use": {
                "source": {"$ref": "#/definitions/specialTimestamp"},
                "with": {
                    "title": "Last Event of Interest",
                    "description": "The last time that something interesting happened.  Cannot be directly updated through the API.",
                    "readOnly": true,
                }
            }
        }
    },
    "definitions": {
        "specialTimestamp": {…}
    }
}

This shows that the type can be concisely defined in a shared location, with a clear usage object applied to it.

Meta-schema considerations

This could be added to the meta-schema by moving most of the keyword definitions into the meta-schema's definitions section grouped by type of keyword, and defining with as a schema minus the forbidden keywords.

tadhgpearson commented 7 years ago

@handrews - I think that it's very important to specify this.

As a user, it's a major advantage to be able to use JSON schema with many tools for many purposes. I can use it for a documentation generator on my website, a code generator and publish it in 3rd party directories which might have their own UIs. If the behavior of, for example description is not specified, then the tools may behave differently, and I can no longer effectively guarantee the format of my descriptions in generated documentation.

This isn't an arbitrary problem. In early versions of Swagger 2 the default keyword was left open to interpretation. As a result, default means example value in Smartdocs (which was used for documentation generation on a website) but default value if no input is provided in some code generators. (This is now the official meaning in Swagger.) The result is users in India complained that generated code automatically did a flight search from Boston to London if the user provided no input, simply because of this default value for the documentation - which is obviously not ideal behavior!

handrews commented 7 years ago

@tadhgpearson I think I wasn't clear. I think specifying makes sense, I'm just not convinced it should be specified in the validation spec. There is a larger discussion going on about whether the whole "Meta-data keywords" section belongs in the validation spec at all. That won't change in draft-07 but it is being kicked around, in part for exactly this reason: the community wants these things specified more, but such specification isn't really about data validation.

By moving it to a separate (but still "official", whatever that means) specification, we can address the real concerns around these keywords without limiting validation behavior. All of your examples are about generating docs, code or UI. That's what the new specification(s) would focus on.

None of your examples have anything to do with validation, unless I'm missing something.

What we've come to realize is that validation and data definition are not the same thing, and a lot of the difficulty various tools struggle with has to do with finding a way to support concepts that are really only useful for validation. Or dealing with people complaining that they are not supported, even when the use cases for data definition are unclear.

erayd commented 7 years ago

@tadhgpearson That's not really related to $use though. That's more an issue of default being ambiguously specified, resulting in schema authors using it to mean several different things. If default is properly specified and / or split into multiple keywords, then the problem goes away.

The same comment applies to other metadata keywords, although I think default is the most severe example.

tadhgpearson commented 7 years ago

@erayd Correct, absolutely not related to $use. To clarify - my example above was related to the default in Swagger 2 input parameters, and was not related to JSON-schemas default. I was just using it as a way to highlight the issues that occur when field usage is left without a clear specification.

handrews commented 7 years ago

@erayd yup. We toyed with the multiple keywords approach for default, but it bogged down. I think that was another side effect of trying to specify meta-data keywords like validation keywords.

For data definition, I think the specifications end up being a bit looser. We need to avoid things like accidentally using examples as runtime defaults (examples obviously helps with that). But when combining schemas with meta-data keywords, there may be a number of reasonable strategies that different applications might choose. We can then recommend a range of options as long as they don't actively clash with each other. Or specify things like default MAY be used as a runtime default inserted into data, but such usage MUST be controlled by user configuration. Or something like that- I just made that up, don't take it too seriously.

erayd commented 7 years ago

@handrews Mmm, there's a whole big discussion to be had there I think. Ambiguity around default is the issue that motivated me to get involved with the spec in the first place. Draft-09 perhaps?

handrews commented 7 years ago

@erayd it will probably go over to json-schema-org/json-schema-vocabularies. Which is currently set up for three vocabularies for (docs, UI, code) generation. I'm starting to think that there is actually a basic data definition spec to work out, which may work for some and form the basis for others (obviously annotating whether an enum should be a drop-down or a set of radio buttons is UI-specific).

That would also let it evolve independent of the existing schedule. I expect it would be driven by a slightly different set of participants, just as we have people who look primarily at hyper-schema, and people who only care about validation.

handrews commented 7 years ago

@erayd @tadhgpearson we actually already have an initial proposal from someone, which you can see here: https://github.com/json-schema-org/json-schema-vocabularies/pull/4

there's some discussion already, and we'll pick it up more after draft-07 (but feel free to comment)

erayd commented 7 years ago

@handrews I like the sound of that.

handrews commented 6 years ago

At the end of the vote-a-rama, I said that I would consolidate these issues to focus the discussion for draft-08. I've filed #515 which outlines the two possible competing approaches. It's pretty abstract, but once we choose an approach, deciding which exact keyword and behavior we want should be less controversial. Therefore I'm closing this in favor of #515.