json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.43k stars 250 forks source link

Keyword for extending a schema #907

Open awwright opened 4 years ago

awwright commented 4 years ago

I've been trying to implement $recursiveRoot and $recursiveRef myself, and it's pretty gnarly. I think we were trying to keep it super simple for implement, but it ended up having so many rules it's actually difficult to comprehend what's going on.

I have an idea for a different mechanism for "extending" recursive schemas, let's call it "$extends".

When a validator encounters the "$extends" keyword, it follows the reference to the schema with the given URI, and applies that schema as an applicator keyword, the same way "$ref" works—but with one change in behavior:

Any time a sub-instance applies the extended schema (because of a $ref or $extends recursive reference), the extending schema is also applied.

The mechanism to perform this is, I think, straightforward: the validation function just needs to accept an optional argument, a mapping of schema → set of schemas, by default whatever the parent value was, or an empty map (at the document root). The behavior: any validations against a schema in the map, will also validate that instance against all schemas in the mapped set.

This should even allow multiple recursive levels, for example, where one property recurses back to the parent, and a different property recurses up to the grandparent. As long as you use "$extends" for both cases, you should be able to write an extending schema that always gets applied when the base is; and so (afaict) unevaluatedProperties should still work properly.


An example for adventurous parties to wrap their heads around:

{
    $id: "base",
    type: "object",
    required: [ "id" ],
    properties: {
        "id": { type: "string" },
        "fancy": {
            $anchor: "fancy",
            type: "object",
            required: [ "children" ],
            properties: {
                "children": { items: { $ref: "#fancy" } },
                "schema": { $ref: "#" }
            }
        }
    }
}
{
    $id: "hyper",
    $extends: "base#",
    properties: {
        "links": { type: "array", items: { type: "string" } },
        "fancy": {
            $anchor: "fancy",
            type: "object",
            required: [ "rank" ],
            properties: {
                "rank": { type: "integer", minimum: 0 },
                "children": { items: { $ref:"#fancy", $extends: "base#fancy" } }
            }
        }
    }
}

An instance, validated against hyper:

{
    id: "A",
    links: [ "B", "C" ],
    fancy: {
        rank: 1,
        children: [
            {
                rank: 3,
                children: [
                    {
                        rank: 2,
                        schema: {
                            id: "B",
                            links: ["A"]
                        }
                    }
                ]
            }
        ],
        schema: { id: "C" }
    }
}
handrews commented 4 years ago

@awwright To start off with "$id": "base#fancy" and "$id": "#hyper" are not valid draft 2019-09 syntax. Could you please update the examples to conform with the current draft so that I'm sure I'm interpreting what you mean correctly?

It's also not clear to me why this is better than my proposal regarding "$recursiveAnchor" that I put on the slack channel yesterday and had not yet filed because I was waiting to see if there were any reactions to the direction of the idea. I think fixing the $id vs $anchor aspects in your example will make that more clear.

handrews commented 4 years ago

@awwright thanks for fixing the syntax, this is more clear to me.

What I don't understand here is how this solves the problem that $recursiveRef solves. With $recursiveRef and $recursiveAnchor, the hyper-schema dialect meta-schema looks like this (minus $vocabulary and other non-essentials to keep it short):

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/hyper-schema",

   "$recursiveAnchor": true,

    "allOf": [
        {"$ref": "https://json-schema.org/draft/2019-09/schema"},
        {"$ref": "https://json-schema.org/draft/2019-09/meta/hyper-schema"}
    ]
}

In your proposal, I think it would need to be written as:

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/hyper-schema",

    "$extends": "https://json-schema.org/draft/2019-09/schema",

    "$ref": "https://json-schema.org/draft/2019-09/meta/hyper-schema",

    "properties": {
        "additionalItems": { "$ref": "#", "$extends": "schema#" },
        "additionalProperties": { "$ref": "#", "$extends": "schema#"},
        "dependencies": {
            "additionalProperties": {
                "anyOf": [
                    { "$ref": "#", "$extends": "schema#" },
                    { "type": "array" }
                ]   
            }   
        },  
        "items": {
            "anyOf": [
                { "$ref": "#", "$extends": "schema#" },
                { "$ref": "#/definitions/schemaArray" }
            ]   
        },  
        "definitions": {
            "additionalProperties": { "$ref": "#", "$extends": "schema#" }
        },  
        "patternProperties": {
            "additionalProperties": { "$ref": "#", "$extends": "schema#" }
        },  
        "properties": {
            "additionalProperties": { "$ref": "#", "$extends": "schema#" }
        },  
        "if": { "$ref": "#", "$extends": "schema#" },
        "then": { "$ref": "#", "$extends": "schema#" },
        "else": { "$ref": "#", "$extends": "schema#" },
        "allOf": { "$ref": "#/$defs/schemaArray" },
        "anyOf": { "$ref": "#/$defs/schemaArray" },
        "oneOf": { "$ref": "#/$defs/schemaArray" },
        "not": { "$ref": "#", "$extends": "schema#" },
        "contains": { "$ref": "#", "$extends": "schema#" },
        "propertyNames": { "$ref": "#", "$extends": "schema#" }
    },
    "$defs": {
        "schemaArray": {
            "type": "array",
            "items": { "$ref": "#", "$extends": "schema#" }
        }
    }
}

The whole point of $recursiveAnchor and $recursiveRef is to remove the need to re-declare every single recursive reference in the more derived schema. Am I missing something?

awwright commented 4 years ago

@handrews In my example I put $extends next to $ref, but it can go in the target schema just as well—that probably makes more sense. There shouldn't be a difference (I just thought of that form first).

And there's no need to re-declare any of the applicator keywords. If the base schema refers back to the target of $extends, it goes back to the extending schema instead; and it passes this fact along when validating sub-schemas.

Re-declaring the applicator keywords in the extended schema is likely only necessary if multiple schemas need to be in the "extends map".

So, we should have merely this:

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$extends": "https://json-schema.org/draft/2019-09/schema",
    "$ref": "https://json-schema.org/draft/2019-09/meta/hyper-schema"
}

And here's what goes on, let's validate a simple schema:

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "additionalProperties": { "links": [ {"rel": "self", "href":"{+address}"} ] },
}
  1. Program calls Validate(instance, <hyper-schema>, {}), which applies two subschemas to the same instance (steps (2) and (3) below)
  2. validator calls Validate(instance, <meta/hyper-schema>, {}) via $ref
  3. validator calls Validate(instance, <schema>, { <schema> → [ <hyper-schema> ] }) via $extends
  4. Any calls to Validate inside evaluation of <schema> pass along the mapping... so when it handles the additionalProperties subschema, it applies the $ref: "#" keyword: that URI reference resolves to <schema>, and is then passed through the map to become <hyper-schema>. Thus, the validator calls: Validate(instance.additionalProperties, <hyper-schema>, { <schema> → [ <hyper-schema> ] })
handrews commented 4 years ago

@awwright thanks- while I still don't quite get it, I'm working through it and will comment more later.

Just to check, regardless of the syntax I think the primary difference here is that you are passing some sort of additional context down into the base schema, so that certain references resolve into that context (here <hyper-schema>) instead of in the local context (<schema>), while $recursiveAnchor works by setting a marker higher in the execution context, and resolves back to that marked context.

I'm fairly certain you can implement that "higher" marker as a passed-down context in some way, and IIRC @gregsdennis might actually do something like that in his implementation? Whether he does or not, I'd love for his input on which of the way things are now, vs my proposal in #909, vs this proposal, seems simplest to him.

gregsdennis commented 4 years ago

I'm not sure I follow this proposal either. It may be due to my unfamiliarity with hyper-schema, though, but I'm still failing to see how this solves a problem that $recursive* don't, especially considering #909.

gregsdennis commented 4 years ago

As to handling of these keywords in Manatee.Json, I pass a context object through the validation process that keeps track of the current recursive anchor so that the recursive ref can resolve to it.

handrews commented 4 years ago

@awwright I think I'm starting to understand this. I want to do an example in a bit more detail, and I'll refer to the following (meta-)schemas, presented below in abridged form.

I'll go through the actual example in the next comment, this is just the set of files being used.

The only notable difference here is that the dialect schemas extend multiple other schemas, so I changed $extends to take an array. have put multiple $extend keywords in an allOf where needed.

Hyper-Schema dialect

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$extends": [
        "https://json-schema.org/draft/2019-09/schema",
        "https://json-schema.org/draft/2019-09/meta/hyper-schema"
    ]
}

Standard dialect

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "https://json-schema.org/draft/2019-09/schema",
    "type": ["boolean", "object"],
    "$extends": [
        "https://json-schema.org/draft/2019-09/meta/core",
        "https://json-schema.org/draft/2019-09/meta/applicator",
        "https://json-schema.org/draft/2019-09/meta/validation",
        "https://json-schema.org/draft/2019-09/meta/content",
        "https://json-schema.org/draft/2019-09/meta/format",
        "https://json-schema.org/draft/2019-09/meta/meta-data"
    ]
}

Applicator vocabulary

{
    "$schema": "https://json-schema.org/draft/2019-09/schema",
    "$id": "https://json-schema.org/draft/2019-09/meta/applicator",
    "properties": {
        "not": {"$ref": "#"}
    }
}

Hyper-Schema vocabulary

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/meta/hyper-schema",
    "properties": {
        "links": {
            "type": "array",
            "items": {"$ref": "https://json-schema.org/draft/2019-09/links"
        }
    }
}

Link Description Object

{
    "$schema": "https://json-schema.org/draft/2019-09/hyper-schema",
    "$id": "https://json-schema.org/draft/2019-09/links",
    "type": "object",
    "properties": {
        "targetSchema": {"$ref": "https://json-schema.org/draft/2019-09/meta/hyper-schema"}
    }
}
awwright commented 4 years ago

@handrews Thanks for trying to grok this, nice catch with the array form

handrews commented 4 years ago

@awwright I got distracted but will come back with my take from a walk-through with the above schemas tomorrow.

jdesrosiers commented 4 years ago

If I understand this correctly, this looks like it maps to the current $recursiveAnchor/$recursiveRef behavior with one important difference. It seems to me that $extends is like setting "$recursiveAnchor": true and then each schema in $extends is evaluated like all $refs are $recursiveRefs.

The important difference is that all $refs are treated like $recursiveRefs. That would be problematic. Some $refs need to be local. Even some self $refs need to be local even then extended.

I like that this cleans up the syntax, but if I understand this correctly, I don't think it's flexible enough.

handrews commented 4 years ago

OK I managed to go through a couple of different paths here.

TL;DR:

Does it do what it says?

The most interesting path is through the links schema, which can also be used stand-alone (and then becomes a separate entry point into the system as it references a meta-schema). I have it pointing to the hyper-schema vocabulary schema (rather than the dialect one) because that's the more complex case.

If we start from the hyper-schema dialect meta-schema, by the first time we get to the links schema, the map looks something like this (with relative URIs for readability):

<schema> => [<hyper-schema>]
<meta/hyper-schema> => [<schema>]
<meta/core> => [<schema>]
# etc for all vocab meta-schemas

So when we get to "$ref": "meta/hyper-schema":

  1. we look up meta/hyper-schema which is extended by schema
  2. we look up schema which is extended by hyper-schema
  3. we look up hyper-schema, which is not extended
  4. we validate next with hyper-schema, not meta/hyper-schema or schema

This does produce the desired behavior. The double-lookup could be optimized by modifying the map rather than just adding to it, although that could cause its own complications.

Starting from the links schema, on the first pass there is no map, so the reference to meta/hyper-schema resolves directly, and these two mutually refer to each other without extension. That is also the correct behavior (which is why in the actual links schema, it refers to the hyper-schema dialect, not the hyper-schema vocab).

This is also the correct behavior.

So I agree that this proposal does what it says it does.

Is that the right thing to do?

I do think that the issue raised by @jdesrosiers is a concern. A key piece of the $recursive* system for me is that all points have to opt-in. You have to say you are extensible, the extension has to say it is an extension, and the reference has to say that it respects extension.

Of these three points, the "saying you are extensible" part is arguably the least important. The idea was that not all schema designers may want to allow extension, so be default it was disallowed. I have no idea if that's a real concern or not.

The "say it is an extension" part is obviously important. This proposal requires you to specify who you are extending, and subsumes the allOf that would otherwise do that. This is arguably more clear. It may technically be less flexible, but I'm not sure that we're losing anything useful.

The "reference says it respects extension" is, I think, critically important. We discussed this with the original $recursive* proposal. Having $ref magically change behavior (when it has never done such a thing before) is not something I'm comfortable with.

However, this could easily be "fixed" by pairing $extends with $extendedRef or something like that. We're back to two keywords, but I don't think there's any way to avoid that without fundamentally changing the nature of $ref to be more dynamic (although see the next section).

So: This is very close to what we've previously said we want, and I think by adding $extendedRef or similar, we could get close enough for this to meet the requirements. I need to think a bit more on the "having to say who you extend" part. Which brings us too...

Is this the right terminology?

Names are hard. $extends strongly suggests OOP. I intentionally avoided OOP terminology, because:

I'm pretty sure if we have an $extends people will use it everywhere that they can, whether it makes any sense or not. OO has been hammered into people for a couple decades now and it's the default thought pattern.

I'd really rather avoid OO terminology. The fact that $extends is "allOf plus additional semantics" feels a little weird, too. Perhaps I'm just too attached to the semantic/structural split idea, but I'd like to hear other thoughts on it.

What about the proposed $recursiveAnchor change?

In #907, $recursiveAnchor switches to a fragment name, and the recursion is tied to that fragment name. This is more explicit and controllable than the boolean, but you still have to look through all of the related schemas to figure out which one(s) also declare the same $recursiveAnchor. Or if we decided that all schema resources are extensible, you'd at least need to figure out who will get to a $recursiveRef to that anchor to figure out the connection.

As for the internal mechanism, I don't think there's a substantial difference. Either way you need to keep track of something defined higher up in the scope. Either "this extends that" or "this recursive anchor was first seen here".

So I think the implementation is roughly equivalent. The difference is that $extends leverages OO to be more accessible, and $recursive* are intentionally more obscure so that people don't use them unless they really need them. It's questionable as to whether that should be a goal.

OK so does this ramble have a point?

Um... maybe?

I'm cautiously open to a syntax more like $extends if it is paired with a specific reference keyword such that $ref and $whateverRef always behave consistently.

But I'm not a fan of $extends as a name. I also don't yet have a better idea.

I'm still a little uncertain about combining the "extends this schema" with the allOf to bring in that schema. I might be more open to it if those are split, perhaps annotating the $ref with an adjacent keyword. Which would bring it more in line with how we want to support OO. Is that good or bad? I'm not sure.

OK, enough rambling. I think somewhere between this and #909 we can hammer out an improved system, but I'm not sure what it is yet.

jdesrosiers commented 4 years ago

I do think that the issue raised by @jdesrosiers is a concern.

Then you didn't understand the point I was making because you proceeded to make almost exactly the same argument just with a lot more words.

handrews commented 4 years ago

@jdesrosiers I'm confused. I said "I DO think that the issue raised by @jdesrosiers IS a concern.

And then I used a lot more words. As I usually do. But to elaborate on my thinking. I don't know where I stand on this so laying out my thought process was the best I could do right now.

awwright commented 4 years ago

Good points all around and I will address them on more sleep.

...

Well, fine... two quick points:

(1) This doesn't have to be an OO system, however it seems reasonable to me for JSON Schema to support unmarshalling onto an OO type system.

(2) I was designing this so that being "extended" is obligatory — though if sometimes you need to recurse back to the literal $ref target and not the extending schema, it's possible to work around that with $defs or something.

If the goal is a cooperative system, that's fine too. But in my understanding, that actually makes this a bit more complicated, not less.

I think the next step is trying a few different proof-of-concepts. This is a very complicated proposal. Hopefully I can look at the other ideas, and rework this some, until we've got something that's intuitive.

jdesrosiers commented 4 years ago

@handrews Ha! Sorry, I read that wrong.

handrews commented 4 years ago

@awwright

This doesn't have to be an OO system, however it seems reasonable to me for JSON Schema to support unmarshalling onto an OO type system.

We've literally been talking about supporting this for the past year or two, and it's the primary driving use case for $vocabulary. And I've been talking up the "add an annotation vocabulary" to the OpenAPI folks for a year, and it's a major part of why they are adopting full JSON Schema compatibility in the next, soon-to-be-released, version.

This is a thing we want, but there's been a ton of conceptual work done, which has been validated to a significant degree with JSON Schema's largest "customer" community. If we want to talk about OO type system support, that is a far larger conversation and needs to build on the work that has already been done across both the JSON Schema and OpenAPI communities.


If the goal is a cooperative system, that's fine too. But in my understanding, that actually makes this a bit more complicated, not less.

In terms of implementation, it's essentially the same. If you're doing it as passing information down during evaluation, then:

In terms of usage, it depends on whether you think explicit or implicit is more clear.

I still strongly dispute that schemas are inherently polymorphic- that's not the model that we've used to date, it's contrary to the model we've promoted over the last two years, and it's an enormous philosophical change. In that sense, I think the ease-of-use of obligate extension is problematic. And we can solve OO type system problems without it by decoupling structure and semantics.

The cooperative model is less intuitive to the user, but more explicit. More importantly, it satisfies the principle of least power in that it solves exactly the mechanical problem of recursive references without saying anything about the semantic rationales for which such references are used.


I think the next step is trying a few different proof-of-concepts.

Keep in mind that @gregsdennis and (I think) @jdesrosiers have already implemented the existing $recursive* keywords, so we already have at least one and probably two proof-of-concepts already on that side. In fact, @gregsdennis saying he was able to implement the 2019-09 approach (before it was published) was a key reason that we went with it.

jdesrosiers commented 4 years ago

Yes, I've implemented everything from draft-04 to draft 2019-09 (plus OAS 3.1) with the exception of format. That includes vocabularies and $recursiveAnchor/$recursiveRef. It was difficult to understand these keywords, but it was fairly easy to implement.

handrews commented 4 years ago

@jdesrosiers that's what appeals to me about #909. As you noted over there, aligning $recursiveAnchor with $anchor and getting rid of the base URI talk would make it a lot more understandable.

Talking about base URIs seems to always confuse a lot of people, and the use of a boolean in the target schemas doesn't have any clear analogue anywhere that I know of. I think I was just trying to make it all as limited as possible.

handrews commented 4 years ago

@awwright given that the changes in #909 have been fairly well-received, and that it represents a much smaller change to the approach we already have, and (most importantly) given that we expect a lot of work on code generation that will fit very well with OO extension concepts, would you be OK if we go with #909 for the next draft, and keep this open for consideration when we get to code gen? I don't mean that this proposal should be restricted to code gen, I just think it's best considered alongside of it because of the OO implications.

My biggest concern is whether you think that #909, as I explain it three comments up from this one, is reasonably easy to implement. If so, I'd like to proceed with that, and defer the question of whether we want obligate extension polymorphism until the future.

Cooperative recursion is in some ways more complicated, but it is intended as a very much advanced under-the-hood feature. I'd like to find a way to make it a little easier without moving to an OO paradigm or rushing work that we should be doing in coordination with OpenAPI post-3.1.

awwright commented 4 years ago

@handrews Sure, I guess. I was hoping to get a proof-of-concept out the door sooner than I am. Then I could experiment more, "oh this works... this doesn't... I understand what's going on here" I think that's the first order of business in either of these issues. (Full disclosure, that's the next thing I'm working on in my implementation and I'm stuck on it.)

handrews commented 4 years ago

@awwright I'm not sure what you mean about a proof of concept for the other issue, two people have implemented the current version and said that that proposal would work fine in their implementation.

What more proof are you asking for for #909? I'm trying to make sure I'm not steamrolling over a real problem here, and this draft has a constraint of matching the next OAS release, which is hitting RC1 as early as today. So we can't just mess around with this on our own schedule, and I've devoted most of my JSON Schema time to this discussion this week, which means I've made zero progress on the long and growing set of issues that need to be finished for the draft.

We need to be wrapping up this draft, not opening it up, right now.

Relequestual commented 4 years ago

We need to be wrapping up this draft, not opening it up, right now. - @handrews

If we want to talk about OO type system support, that is a far larger conversation and needs to build on the work that has already been done across both the JSON Schema and OpenAPI communities. - @handrews

Yeah, I would strongly appose adding any more $ keywords, or changing behaviour like this, right now.

I'm going to leave this issue open but highlight it's low priority and not part of draft-8-patch-1 milestone, so we can focus.

I really want to put this issue on ice till draft 2020-whatever is out the door. We have at least two implementations which have working code for recursive, and it took me a few minutes to understand, but I'm super confident it makes a lot of sense.