josephburnett / jd

JSON diff and patch
MIT License
826 stars 38 forks source link

Patch or update mode #36

Closed JeffFaer closed 2 years ago

JeffFaer commented 2 years ago

Applying a patch like

@ ["service","dns","forwarding","cache-size"]
- "10000"
+ "9999"

to

{
  "service": {
    "dhcp-server": {
      "use-dnsmasq": "enable"
    }
  },
  "system": {
    "host-name": "router"
  }
}

currently fails with

Invalid path element service.

but ideally I'd like to see

{
  "service": {
    "dhcp-server": {
      "use-dnsmasq": "enable"
    },
    "dns": {
        "forwarding": {
            "cache-size": 9999
        }
    }
  },
  "system": {
    "host-name": "router"
  }
}

Would it be possible to have an option to patch if the field exists, but if the field doesn't exist just set it to the new value?

josephburnett commented 2 years ago

This is a great idea. A few comments on your implementation (https://github.com/josephburnett/jd/pull/37/commits/cb63c71d5eeb98ca64184b5c0dd333d8bc0292a6):

I don't want to add a one-off parameter to the path method: func (a jsonArray) Patch(d Diff, populate bool) (JsonNode, error). It's an important invariant that patch files contain all the metadata necessary to validate and apply a change. For example a patch file which treats a JSON array as a multi-set will use ["multiset"],{} in the path instead of a single index number (example).

The instruction to create elements of the path as necessary should also be specified in metadata. I like the tag assoc-in because it has semantics like the function. So you could say @ [["assoc-in"],"service","dns","forwarding","cache-size"].

Your example would be:

@ [["assoc-in"],"service","dns","forwarding","cache-size"]
- "10000"
+ "9999"

although

@ ["service",["assoc-in"],"dns","forwarding","cache-size"]
- "10000"
+ "9999"

should also apply cleanly.

The second issue is how to interpret - "10000". This reads "remove 10000" but that can't happen if the path doesn't already exist. Or if the path does exist but a different value (e.g. "9998") is already there. Should the patch fail to apply? The default and most strict behavior is to fail the patch. We want to say "put the value there, even if you have to overwrite something". Maybe we should drop the - "10000" since we don't care what's there (another reason I chose the tag assoc-in).

So your final patch would be:

@ [["assoc-in"],"service","dns","forwarding","cache-size"]
+ "9999"

Another interesting topic is how you should produce diffs with the behavior. But we can save that for later.

JeffFaer commented 2 years ago

Ah okay. Admittedly I didn't look too much into the set/multiset behavior. Adding "assoc-in" to the patch makes sense.

For my use case, the behavior I would want is that if the field already exists with a different value, applying the patch would fail. Another fun edge case I've found in testing with my patch is that array patches become significantly less useful:

$ jd foo.json bar.json 
@ ["foo",2]
- 3
+ 4
@ ["foo",3]
+ 5
$ jd -p -populate <(jd foo.json bar.json ) <(echo '{}')
2021/11/21 15:52:32 Addition beyond the terminal element of an array.

It seems like we'll need "assoc-in" patches to include the full array if we want to be able to apply them. What are your thoughts on that?

JeffFaer commented 2 years ago

https://github.com/JeffreyFalgout/jd/tree/assoc-in is in a relatively good state now. I changed the flag to be -assoc-in. You specify it on diff to generate a assoc-in patch. I still haven't tackled the JSON array issue though.

If it's something we want to support, we probably need to explicitly include a "no-op" patch for every element in the array. Or maybe we could just diff the entire array wholesale instead of diffing them elementwise

josephburnett commented 2 years ago

we'll need "assoc-in" patches to include the full array if we want to be able to apply them

I don't quite understand. What's in foo.json and bar.json in your example?

JeffFaer commented 2 years ago
$ cat foo.json 
{
    "foo": [1, 2, 3]
}
$ cat bar.json 
{
    "foo": [1, 2, 4, 5]
}
$ jd foo.json bar.json 
@ ["foo",2]
- 3
+ 4
@ ["foo",3]
+ 5

The problem is that the patch doesn't encode enough information about the foo array for us to be able to apply it in assoc-in mode. It's missing the first two elements, 1 and 2.

josephburnett commented 2 years ago

The point of encoding assoc-in behavior in the patch is so the patch itself will tell you if it should be used. You only specify assoc-in during diff. Of when constructing the patch manually.

For example, both of these patches will result in the bar.json when applied to foo.json:

@ ["foo",2]
- 3
+ 4
@ ["foo",3]
+ 5
@ [["assoc-in"],"foo",2]
+ 4
@ [["assoc-in"],"foo",3]
+ 5

You could write the patches by hand or create them with jd foo.json bar.json and jd -assoc-in foo.json bar.json respectively. When the global -assoc-in flag is set, diff should emit assoc-in metadata for any additions or replacements.

JeffFaer commented 2 years ago

Right. I think we might be talking past each other a bit here.

In my use case, I'm effectively applying -assoc-in patches to an empty JSON object:

$ jd -assoc-in foo.json bar.json
@ [["assoc-in"],"foo",2]
- 3
+ 4
@ [["assoc-in"],"foo",3]
+ 5

$ jd -p <(jd -assoc-in foo.json bar.json) <(echo "{}")
2021/12/06 18:50:22 Addition beyond the terminal element of an array.

The problem is that the patch generated at first doesn't contain enough context to be able to apply it to an empty object. It's missing the first two elements of the "foo" array:

$ cat foo.json 
{
    "foo": [1, 2, 3]
}
$ cat bar.json 
{
    "foo": [1, 2, 4, 5]
}
$ jd foo.json bar.json 
@ ["foo",2]
- 3
+ 4
@ ["foo",3]
+ 5

The patch starts at element 2, which is beyond the terminal element of the array (the terminal element being the 0th element since the array does not yet exist).

The behavior I want is that the full array is encoded in the assoc-in patch so that I don't run into this error, and the patch gives me a partial copy of bar.json.

This would mean that when generating diffs with assoc-in set, we would diff the entire JSON array instead of only the elements that were different.

1) Do we want that behavior to be baked into assoc-in, or should it be controlled by a separate flag that can be used in tandem with assoc-in? 2) What behavior do we want if the assoc-in patch is missing some array elements? Do we want it to yield an error like above, or do we want it to silently populate the array with null elements up until the first patched element?

josephburnett commented 2 years ago

Let's not support assoc-in patches in arrays. That's on par with JSON merge patch limitations. If we are producing an assoc-in patch then when we encounter an array we just emit the entire thing into the patch. And if we are applying an assoc-in patch and we encounter an array index then it's an error.

So a valid patch would be:

@ [["assoc-in"],"foo"]
+ [1,2,3,4,5]

Out of curiosity, do you have a use case for patching in an array? I would like to ground the patch semantics in real-world use-cases as much as possible.

josephburnett commented 2 years ago

Since we've landed on semantics that are nearly identical to JSON merge patch, perhaps we should name this metadata as merge instead of assoc-in. @JeffreyFalgout what do you think?

It would also facilitate outputting diffs in JSON merge format. I've started working on #1 to process standard JSON patches.

josephburnett commented 2 years ago

@JeffreyFalgout do you want to take a crack at implementing the array behavior ^^^ ? Or would you like me to?

JeffFaer commented 2 years ago

Doing the full array for assoc-in patches SGTM. That's the direction I was leaning anyway. Renaming it to merge instead of assoc-in makes sense. I didn't realize there was another RFC along the same lines.

I'll try and find some time this weekend to make those changes

josephburnett commented 2 years ago

Great, thanks! We should aim to have lossless translation between JSON Merge Patches and jd patches with merge metadata on all hunks. I think our current proposal does because we'll emit a hunk for every leaf where a leave is either an array or a scalar value in a map.

I'm also working on having translation between JSON Patches (the imperative RFC) but I can't yet support the whole thing (#1).

JeffFaer commented 2 years ago

I didn't get around to it this weekend :(

I might get around to it at some point later this week, but if I don't it probably won't happen until early January sometime. If you're feeling motivated I won't be offended if you want to take this over and make the changes we were talking about

josephburnett commented 2 years ago

No rush. Happy holidays!

josephburnett commented 2 years ago

@JeffreyFalgout I think I'll take this one back and take a crack at it.

JeffFaer commented 2 years ago

Oh shoot, sorry @josephburnett. This totally fell off my radar with the new year. I'm happy to let you take this one, though. I still likely won't have much time to contribute here, still :(

josephburnett commented 2 years ago

No problem at all. Thanks for the great idea!

josephburnett commented 2 years ago

This is now available as the RFC 7396 compatible “merge” format in v1.6.0. Use it with the -f merge flag on the command line or in the UI.