$merge and $patch for "schema extensions"

fge commented 8 years ago

And again, those are the two only reliable mechanisms allowing for truly extending schemas in an unambiguous fashion (yes, I love unambiguous definitions). Recall:

$merge relies on JSON Merge Patch, aka RFC 7396;
$patch relies on JSON Patch, aka RFC 6902.

Recall:

implementations would be REQUIRED to implement $merge;
implementations MAY implement $patch.

Why, then, define $patch? Simply because it allows for schema alterations which $merge cannot do. However, $merge answers the vast majority of cases.

Recall of the rules:

those keywords take precedence over all other JSON Schema keywords;
however, $ref still takes precedence.

brettz9 commented 8 years ago

I'd personally prefer requiring the more expressive one, JSON Patch by default (and either also requiring JSON Merge Patch or making it optional). Even though JSON Patch is more cumbersome, null is, imo, not that obscure of a feature to forego if support is being added for one or the other.

awwright commented 8 years ago

You'll have to clarify what you mean by "And again" because this is an entirely new issue in the tracker. Can you provide some use-cases of what problems this solves?

fge commented 8 years ago

@Acubed I have mentioned it countless times on the forums, but anyway, here it goes again...

Schema at uri1 reads:

{
    "description": "base schema",
    "type": "object",
    "properties": { "p1": { "type": "string" } },
    "additionalProperties": false
}

Schema at uri2 wants to extend that schema so that p1 has a minimum length of 4, and add a property p2 of type integer:

{
    "$patch": {
        "source": { "$ref": "uri1" },
        "with": [
            { "op": "add", "path": "/properties/p1/minLength", "value": 4 },
            { "op": "add", "path": "/properties/p2", "value": { "type": "integer" } }
        ]
    }
}

That is with JSON Patch.

With JSON Merge Patch that would be:

{
    "$merge": {
        "source": { "$ref": "uri1" },
        "with": {
            "properties": { 
                "p1": { "minLength": 4 },
                "p2": { "type": "integer" }
            }
        }
    }
}

This means there is no need for the hacks of "limiting only to defined properties" etc.

Unambiguous and it Just Works(tm).

And yes, I said "again" since I have proposed this idea for more than a year.

awwright commented 8 years ago

@fge Please understand it needs to get posted here, too, for the record. Thanks!

brettz9 commented 8 years ago

Note that with JSON Merge Patch, null is used to indicate deletion of a value, but there is no mechanism in that specification for adding the value null. So while JSON Merge Patch is a nice option to have around since it is more compact and easier to see at a glance what the changes are, it has this deficiency regarding null values.

fge commented 8 years ago

@brettz9 yes, I am fully aware of it; however, no JSON Schema keyword currently allows null as a value, so this remains a viable option.

Maybe this will not be the case in v5 though.

epoberezkin commented 8 years ago

"type": "null" ?

fge commented 8 years ago

@epoberezkin that is the JSON String "null", not JSON null

epoberezkin commented 8 years ago

Ah. I now understand what you mean - there is no null in JSON-schema itself indeed.

erosb commented 8 years ago

@fge "enum" does permit null, doesn't it?

epoberezkin commented 8 years ago

null can be used in enums, but it is ok for $merge, as it replaces the whole array. Only constant will be the issue, but it's an edge-case really.

epoberezkin commented 8 years ago

@fge, just to clarify. The result of both $merge and $patch is the patched schema, there is no assumption that the original schema gets modified, correct?

So I can create schemas like this:

{
  "id": "my-meta-schema.json",
  "$merge": {
    "source": "http://json-schema.org/draft-04/schema#",
    "with": {
      "properties": {
        "myCustomKeyword": { "type": "string" }
      }
    }
  }
}

epoberezkin commented 8 years ago

I agree with @fge that $merge covers literally all real use cases. It doesn't allow removing nulls and adding/removing array items, but I think we can live without it. Drop $patch maybe?

fge commented 8 years ago

@epoberezkin yes, the result is the patched schema; the original schema is not touched in any way, shape or form.

Of course, this means requiring a preprocessing of some sort, but then so does $ref, and it still retains priority.

epoberezkin commented 8 years ago

With the current standard it's relatively easy to extend a schema that is not recursive without merge/patch - allOf does good enough job.

But for recursive schemas this approach simply doesn't work, so there is no mechanism to extend meta-schemas and any other recursive schemas, which is a shame... So $merge is really important.

fge commented 8 years ago

@erosb only as a member in an array.

But then JSON Merge Patch doesn't allow modifications of array elements individually. If you specify an array, the whole array is replaced.

JSON Patch on the other hand allows patching of arrays without a problem.

fge commented 8 years ago

@epoberezkin that is not correct. Your schema would be:

{
  "id": "my-meta-schema.json",
  "$merge": {
    "source": { "$ref": "http://json-schema.org/draft-04/schema#" },
    "with": {
      "properties": {
        "myCustomKeyword": { "type": "string" }
      }
    }
  }
}

The argument to source is a JSON Schema, not a URI to a JSON Schema.

epoberezkin commented 8 years ago

Got it. Yes, makes sense. This way you can both patch some third-party schema and also use the same patch to patch multiple schemas without the need to put them in separate files.

brettz9 commented 8 years ago

FWIW, JSON Patch also has copy and move, especially useful when one wishes to keep in sync with the referenced document, however it changes... JSON Merge Patch not only can't do this but it is ambiguous as to whether a change was a result of a move or just a deletion+addition

(And incidentally, if there is ever a JSON Merge Patch 2.0, I hope it can avoid the null problem by using something like "\u0000" in place of null (and then require strings to begin with an actual NUL to be preceded by one extra NUL) as well as to allow modification of strings by allowing for inclusion of the previous string, e.g., "Prepending some text to the old value: ${current}", with \${ allowing the literal.)

awwright commented 8 years ago

@brettz9 (Or just introduce a "delete" operation.)

brettz9 commented 8 years ago

@Acubed: JSON Patch has the "remove" operation, but I'm speaking of JSON Merge Patch, which doesn't have--and wouldn't work with--operations per se--it just mimics the appearance (of the altered portions) of the document (as in @fge's example)--with the exception of null which it overrides. Overriding a specific string (and providing an escape option for it, as I mentioned) would allow one to avoid the ambiguity.

brettz9 commented 8 years ago

The following is my ideal JSON patch/merge approach which I'll call "Power Patch". It has the full expressivity of JSON Patch, but in a more succinct form, and it can optionally include JSON Merge Patch patches and/or "Whole document" patches (explained below).

The paths mentioned below are, as in JSON Patch, JSON Pointers.

value can be any JSON value.

Order is significant, even for the keys of an object (which, incidentally, ECMAScript now assures).

mergePatchValue would behave as patch objects in JSON Merge Patch, with the exception (discussed earlier) that null could be used as an actual value to add to a document, and a single "\u0000" would trigger deletions instead (with other strings starting with NUL requiring an additional NUL to be added at the beginning). The advantage of this approach is that it allows a clear indication of where modifications are being made without needing to redefine the whole document and without needing to specify the operations (though limited to JSON Patch's add, replace, and remove).

docValue would be shaped similarly to mergePatchValue, but instead of using NUL for deletions, this object would cause any properties which had been present before and which were not specified now to be deleted. Although this might technically not be what we think of as a patch, its value lies in ensuring that each snapshot reflects the whole structure at a given moment, and without explicitly needing to delete unused items. And with its integration into "Power Patch", users have the option of switching to or from this approach as desired.

The rest of the properties (which are inspired by, and expanding on, JSON Patch), offer the advantage of allowing for moving, copying, and testing. Unlike the other approaches, by looking at any given patch, you are able to tell what parts were assumed to be already present (the paths) and what parts are new (the values). In order to allow the JSON Patch properties to work with less redundancy (as in JSON Merge Patch or the "Whole document" method), I've added basePaths which is an object whose key is a base path and whose value is itself a Power Patch object (which can therefore have its own "basePaths" for nesting paths further).

The following is a single patch showing all properties, though each property is optional. These patches could also be put within an array (or an object whose keys are versions) to indicate a sequence of patches to be applied.

{
    merge: mergePatchValue,
    whole: docValue,
    basePaths: {
        path1: {
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            add: {
                path2: value
            }
            basePaths: {
                path2: {
                    replace: {
                        path3: value
                    }
                }
            }
        }
    },
    test: {
        "path1": value
    },
    add: {
        "path1": value
    },
    replace: {
        "path1": value
    },
    move: {
        "fromPath1": "destinationPath1"
    },
    copy: {
        "fromPath1": "destinationPath1"
    },
    remove: ["path1"]
}

The only disadvantage I see to this approach (as compared to JSON Patch) is that if you want to allow some kind of custom options to a given operation, you'd have to add them as a property outside of the operation object (e.g., copyOptions). But I think the succinctness is worth it.

brettz9 commented 8 years ago

One other consideration: as pertains to JSON Schema and if this format were to also be used for versioning (especially for offline storage like IndexedDB), move and copy might also be used as a signal to the application that the content itself ought to be moved or copied when there is a schema change (since the other operations would not provide enough information, e.g., whether a remove+add was deleting the schema property and its associated content or just moving it).

fge commented 8 years ago

@brettz9 there is a reason why JSON Patch uses arrays, and your proposal demonstrates it: you cannot rely on keys order.

In your example it is requested that basePaths be evaluated before anything else... And that won't happen!

brettz9 commented 8 years ago

Actually, ECMAScript is being redefined to allow for predictable iteration order: http://tc39.github.io/ecma262/#sec-ordinary-object-internal-methods-and-internal-slots-ownpropertykeys . (And FWIW, in my example, it is not basePaths alone that needs to work first--that is just an optional item, but yes, iteration order does need to be reliable, but thankfully, it appears that with IE no longer a sticking point, iteration order is becoming reliable again...)

fge commented 8 years ago

@brettz9 ECMAScript maybe. JSON? Won't happen, for backwards compatibility reasons.

brettz9 commented 8 years ago

Yeah, thanks, @fge, I do see the JSON spec mentions it being "unordered".

(And FWIW, I may have mistakenly repeated this comment since for-in order states at https://tc39.github.io/ecma262/#sec-enumerate-object-properties (which is indicated by https://tc39.github.io/ecma262/#sec-runtime-semantics-forin-div-ofheadevaluation-tdznames-expr-iterationkind ) that , "The mechanics and order of enumerating the properties is not specified but must conform to the rules specified below" and the only rule relevant to OwnPropertyKeys is "EnumerateObjectProperties must obtain the own property keys of the target object by calling its [[OwnPropertyKeys]] internal method"--which doesn't suggest for-in must iterate in the order of those keys. However, one could apparently use Object.getOwnPropertyNames safely as that refers to https://tc39.github.io/ecma262/#sec-getownpropertykeys which relies on OwnPropertyKeys which does abide by the first-in order, more or less.)

I hope there will be a JSON 2.0 which fixes this and the separator problem.

But regardless, the format proposed here can still be readily modified to be JSON-friendly (and even preferable to my earlier version since it allows multiple operations of the same type at the same level, e.g., there could be one add at the beginning and one at the end--and it also allows easier access to the operation type, and also if custom options were needed, they could be added as a third argument):

[
    ["basePaths", [
        ["path1", [
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            ["add", [
                ["path2", value]
            ]],
            ["basePaths", [
                ["path2", [
                    ["replace", [
                        ["path3", value]
                    ]]
                ]]
            ]]
        ]]
    ]],
    ["test", [
        ["/path1", value]
    ]],
    ["add", [
        ["/path1", value]
    ]],
    ["replace", [
        ["/path1", value]
    ]],
    ["move", [
        ["/fromPath1", "/destinationPath1"]
    ]],
    ["copy", [
        ["/fromPath1", "/destinationPath1"]
    ]],
    ["remove", ["/path1"]]
]

fge commented 8 years ago

@brettz9 but why?

JSON Patch is an RFC, and implementations exist. Why go to the trouble of defining another alternative?

brettz9 commented 8 years ago

Because it is much less ugly and verbose, especially without basePaths...I also like the hybrid option since one can use the pseudo-JSON Patch for move and copy, and the improved JSON Merge Patch for everything else.

fge commented 8 years ago

@brettz9 beauty is in the eye of the beholder... I myself find JSON Patch to be terse, short and to the point, and your proposal to be needlessly complicated :p

brettz9 commented 8 years ago

Beauty is in the eye of the beholder (and the above could use a sequence of objects within the array instead if preferred) but terseness is not... Ignoring the merge/whole options, here's what the equivalent in JSON Patch would require:

[
    {"op": "add", "path": "/path1/path2", "value": value},
    {"op": "replace", "path": "/path1/path2/path3", "value": value},
    {"op": "test", "path": "/path1", "value": value},
    {"op": "add", "path": "/path1", "value": value},
    {"op": "replace", "path": "/path1", "value": value},
    {"op": "move", "from": "fromPath1", "path": "/destinationPath1"},
    {"op": "copy", "from": "fromPath1", "path": "/destinationPath1"},
    {"op": "remove", "path": "/path1"}
]

Using value as "value", the stringified length is 405, whereas with my version, it is only 323. And that is using basePaths in an example where the paths aren't long, and also not taking advantage of the fact that each operation can have child operations which my example only hinted at which would avoid further duplication.

fge commented 8 years ago

@brettz9 maybe, but the JSON Patch is imnsho more readable.

Also, your paths are wrong. JSON Pointers are always absolute.

brettz9 commented 8 years ago

Ah yes, thank you, fixed, including the counts (I didn't change basePaths since it is designed to avoid the need for absolute paths.)

brettz9 commented 8 years ago

FWIW, it could instead be expressed as the following (also length 323):

[
    {basePaths: [
        {path1: [
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            {add: [
                {path2: value}
            ]},
            {basePaths: [
                {path2: [
                    {replace: [
                        {path3: value}
                    ]}
                ]}
            ]}
        ]}
    ]},
    {test: [
        {"/path1": value}
    ]},
    {add: [
        {"/path1": value}
    ]},
    {replace: [
        {"/path1": value}
    ]},
    {move: [
        {"/fromPath1": "/destinationPath1"}
    ]},
    {copy: [
        {"/fromPath1": "/destinationPath1"}
    ]},
    {remove: ["/path1"]}
]

or if one did not need child arrays (length 303):

[
    {basePaths: [
        {path1: [
            // Operations here would use paths (e.g., path2) relative to the base path (i.e., path1)
            {add: {path2: value}},
            {basePaths: {
                path2: {
                    replace: {path3: value}
                }
            }}
        ]}
    ]},
    {test: {"/path1": value}},
    {add: {"/path1": value}},
    {replace: {"/path1": value}},
    {move: {"/fromPath1": "/destinationPath1"}},
    {copy: {"/fromPath1": "/destinationPath1"}},
    {remove: "/path1"}
]

Note that basePaths is not required--I'm just trying to show the full expressivity here. I think the latter part of the latter example ought to be clear in its purpose--and I think superiority--without all the extra cruft.

JanTvrdik commented 8 years ago

Inventing new syntax is bad idea. We should stick to relying on existing RFCs.

robbie-mac commented 8 years ago

I'd like to derail this conversation a bit, and point out, just because we can do something, doesnt mean we should. Ive seen a couple of examples to support patch and merge, all are based on the fact that a schema author closed their schema to additions. Shouldnt we respect that? If we want to extend a closed schema why dont we copy the schema, make your edits, then rename it and publish it under a new uri? We dont add a crap load of complexity to our validators. Keeping them fast and free from bloat. Yes I understand that programmers are inherently lazy SOB's. Ill put my habd up for that category. Thats why I learned to program in the first place. I wanted to automate tasks that I didnt want to do. So instead of modifying the spec to add/merge/delete/modify, why not create a toolset?

robbie-mac commented 8 years ago

And as an aside, its interesting that people are referencing RFC's that build apon JSONSchema, but the schema itself is not an RFC. Its an internet-draft, and one that has expired at that. Unless I am completely mistaken, which if I am, the spec needs to be updated to reflect this.

brettz9 commented 8 years ago

@robbie-mac : the problem as I see it is that sometimes we want to stay in sync with an existing schema. It's not just about initial effort, but ongoing maintenance...

robbie-mac commented 8 years ago

@brettz9 A good point. However I agree with @JanTvrdik , I don't think this has a place in this spec, specifically because this spec defines a data format for describing other data. Its not, nor should it, include methodology for manipulating that data. Besides, if either the merge or patch specs get approved, then you can use them on the data. You would just have to validate your source json objects by their respective schema. As it sits now, I think that a simple change to the spec, with huge implications to validators, is to modify the definition of $schema, instead of a single string that is a uri, make it an array of strings that are uri. That way you can contain a chain of schema, the last (or first) entry is the terminus by a schema that is self describing. Also, change the application of references to be able to reference ANY of the schema in that chain.

Now you can have a listing for the progression of changes. One that can be $ref'd for different values.

epoberezkin commented 8 years ago

@fge I implemented $merge and $patch in Ajv validator. Thanks.

epoberezkin commented 8 years ago

This keyword is a bit more complex to use and to implement than it seems on the surface because of the $ref resolution in source and in the patch (in $merge keyword the patch can be a schema).

Resolve before or after merging? Ajv doesn't resolve $refs before merging at the moment but it means that you cannot extend them, you have to provide local alternatives to the same $refs (that can use $merge inside them though), which is only possible if they are relative to the current schema...
What is the scope for $ref resolution? The source schema (particularly when source doesn't have it's own ID - should it take parent into account?) or the current schema?
Recursive $refs (e.g. source in $merge has $ref pointing to the current schema)?

seagreen commented 8 years ago

I'd like to derail this conversation a bit, and point out, just because we can do something, doesnt mean we should. Ive seen a couple of examples to support patch and merge, all are based on the fact that a schema author closed their schema to additions. Shouldnt we respect that? If we want to extend a closed schema why dont we copy the schema, make your edits, then rename it and publish it under a new uri?

My initial feelings lean entirely this way as well.

the problem as I see it is that sometimes we want to stay in sync with an existing schema. It's not just about initial effort, but ongoing maintenance...

Interesting. I think if that can be addressed outside of JSON schema that would be best. Perhaps one could set up a server that watches a particular schema at one URI, patches it, and then serves it at another URI? Would that cover your use case @brettz9?

brettz9 commented 8 years ago

If it doesn't make it into JSON Schema, @seagreen , I would probably be more interested in a client-side library that does the resolving/patching first based on my own custom merge/patch meta-language (like JSON References) with the remote schema having been made available to the client via a git module or npm bringing the remote schema to my server. I prefer to avoid server scripts when possible.

awwright commented 8 years ago

I'd like to see some use cases where (1) copying an existing schema and modifying that is preferable, and (2) where this proposal is preferable to copy-and-modify

brettz9 commented 8 years ago

The first set you presented, @awwright , is fine for:

Where you are just using an original schema as a base of inspiration but wish to go in your own independent direction
Where you wish to avoid any potential dangers in live-resolving from a schema not under your control.

The second set is far preferable for:

Keeping one's modifications in sync with a target spec with a minimum of modifications, making for potentially easier maintenance if the target spec gets updated.
Ensuring one's data remains compliant in all regards to a target spec (including the latest version), except for the tightening or loosing of restrictions one wishes to impose. Copy-and-modify might foster laziness in staying up to date with the latest spec since it requires merging changes rather than stand-off markup which may keep on working.

epoberezkin commented 8 years ago

I think using $merge/$patch is not only helping to maintain the schema - it allows to easily see what is the difference between the source and the resulting schema. If, for example, I need to extend a meta-schema to include schemas for some custom properties, they will be clearly visible with $merge/$patch. Same if I add/remove requirements that I want to impose on the schemas.

Programs must be written for people to read, and only incidentally for machines to execute.

handrews commented 8 years ago

[I'm coming back to JSON Schema after some time away from working with it, so I apologize if this comment misses something from the last year and a half or so or just exposes how rusty I am with this stuff]

A good use of $merge is with a field like readOnly, where combining multiple values using allOf doesn't make sense. What would it mean to have an allOf where readOnly is false on one side and true on the other? Using $merge (or $patch) you can easily express "just like this other thing except you can't modify it."

Another useful situation is with a re-usable type that has a serviceable default value for "description", but in many cases that description should be overridden with a description specific to the use of the type. Using allOf for this results in multiple descriptions- there are ways around the problem, but none of them are as clear and readable as $merge.

seagreen commented 8 years ago

@epoberezkin:

I think using $merge/$patch is not only helping to maintain the schema - it allows to easily see what is the difference between the source and the resulting schema.

Great point.

"Programs must be written for people to read, and only incidentally for machines to execute."

Ehrrrm. I think correctness of a specification is more important than the readability of programs using the specification. We're still trying to figure out how $ref works in JSON Schema draft 4 (take a look through the issues and PRs here), so adding more features that will interact with $ref scares me. This doesn't mean that we shouldn't (the advantages could outweigh the disadvantages), just trying to explain my feeling here.

@handrews:

Another useful situation is with a re-usable type that has a serviceable default value for "description", but in many cases that description should be overridden with a description specific to the use of the type. Using allOf for this results in multiple descriptions- there are ways around the problem, but none of them are as clear and readable as $merge.

Nobody's saying you shouldn't have $merge, just that it shouldn't be part of the JSON Schema spec. I obviously agree that $merge is really useful. At the same time it adds no actual new validation capabilities. (This is different than $ref, which lets us express infinitely recursive schemas, which couldn't be done without $ref).

You guys are making great arguments for the expressive power of $merge and $patch though. I feel like the desire to express "this URL is just like this other one, except for $patch such-and-such" will be a commonly occurring situation even outside of JSON Schema. Would it be possible to address that with a separate protocol? Then it wouldn't have to be custom baked into many different specifications. (Also I think in general we're bad at expressing "this information can be purely derived from that information" and would love to hear more thoughts on this. It's an important problem)

handrews commented 8 years ago

My only real concern about how the specifications are arranged or combined is that if the json-schema spec reads in a way that causes implementors to generally not support $merge and $patch, that's a big usability problem.

For interoperability, it is good for json-schema to indicate preferred preprocessing mechanisms. Otherwise we'll have a proliferation of options and tools.

What we need is a guarantee that, given a set of tools that follows the json-schema specification, we can be sure that any site that also follows the json-schema specification will work with those tools. If essential pre-processing is left out, then there is an extra API-specific step of figuring out which preprocessing you need to do. That will be a barrier to broad, consistent adoption.

It won't matter how well json-schema validates anything if it's too awkward to use as specified, and too non-standard to use effectively with additional specifications that make it usable.

Ideally, $merge and $patch could (like $ref) be their own RFC (or RFCs), with json-schema specifying that a conforming implementation should support them, and implement pre-processing and lazy evaluation in such-and-such a manner.

handrews commented 8 years ago

My feelings on $merge and usability come from my experience as one of the architects of Riverbed Technology's APIs (although I left Riverbed about a year and a half ago and am speaking only for myself here). You can see $merge used extensively in the service definitions for the SteelHead and SteelCentral Controller products.

json-schema-org / json-schema-spec

$merge and $patch for "schema extensions" #15