Closed royclarkson closed 10 years ago
Ah, OK. This is a clear bug and part of the code I understand the less :/
I have done some changes in that code from the base implementation so maybe one of my changes are the cause here; @rwatler, ping! If you know where this can come from...
Let me take a look at this case. I'll see if I cannot get JSON Diff to generate the correct input for JSON Patch.
Randy
Thanks for the quick response. We've been experimenting with this library for a couple weeks. It appears to be one of the few (if only) java JSON PATCH libraries. It's great to see the work put into already. @briancavalier has been looking at some different javascript based libraries and there is some inconsistency with how multiple deletes are handled between those. He can probably provide more details about that.
No problem. I can confirm that the original version of the JSON diff algorithm back in version 1.2 does indeed generate the correct/expected patch. However, as @fge has indicated, there has been some rather heavy refactoring of the code since. This makes sense also because we have been using one of the older versions and have not run into this problem.
Now, we need to get the 1.6 version working correctly so that we too can upgrade!
@rwatler so, 1.2 is OK? Good, I'll try and bisect from there and see where things went wrong...
OK, I have a problem, I can't do it right now, I need to fix a bug in one of my other projects; this bug is #2 on my priority list. Unless @rwatler finds the culprit commit before me (which is likely)!
@fge and @rwatler Thanks for looking into this! As @royclarkson mentioned, he and I have been working on client and server communication using JSON Patch. I've been focused on the JavaScript client side, and have definitely seen a good bit of discrepancy wrt "remove" ops across the existing JS JSON Patch libs :(
For example, our current JS implementation's diff function generates a patch similar to the one json-patch
generates, ie from @royclarkson's example, altho reversed:
[{"op":"remove","path":"/2"},{"op":"remove","path":"/0"}]
Some other libs generate a patch like:
[{"op":"remove","path":"/0"},{"op":"remove","path":"/1"}]
Which, while certainly possible to apply incrementally, just seems weird :) IOW, it makes sense to a program, but not to a human (at least, not to this human!).
Previously, I had enabled support for move
operations in our lib (I disabled them in the short term, because we don't need them atm), and it would generate a patch like this:
[{"op":"remove","path":"/2"},{"op":"remove","path":"/0"},{"op":"move","path":"/0","from":"/1"}]
Note the additional move
. That seems like overkill, though.
I figure you've had to think through these issues as well, so I just wanted to see what your thoughts are on the various ways to represent a delete. The RFC seems to only provide very minimal guidance, saying that the patch must be able to be processed in order.
Thanks!
@briancavalier the problem is mainly that JSON Patch does not define "diff"; it will only ever be an addition at best. I did a first, very, very naive implementation and @rwatler came with a "factorizing" implementation. And I have managed to botch it.
I am not sure there exists a "canonical" diff implementation to be honest; for objects, there may be, but for arrays, we of course have the problem that indices may change as we remove or add elements into it.
The current (and buggy, thanks to me) implementation for arrays uses an LCS algorithm (Longest Common Subsequence) and works from there to determine what can be factorized between moves/deletions/etc in the whole target JSON text. At first, of course, my goal will be to fix that.
But afterwards, determining a "canonical" diff representation? Not sure whether that is possible. In the example mentioned above, we can either generate patch:
[ { "op": "remove", "path": "/0" }, { "op": "remove", "path": "/1" } ]
or:
[ { "op": "remove", "path": "/2" }, { "op": "remove", "path": "/0" } ]
depending on whether we start from the end of the array or the beginning. And even then there is the case of removing elements in the middle of an array etc, which obviously has an influence on all further operations altering elements after the removed elements (index changes).
OK, found the culprit commit via bisect:
755516df9ba3a7fa15c00241fca61c2e4bdbfe43
And of course, a lot has happened since then :/
Uhwell, I guess it's "revert all the way" to JsonDiff as it existed at this commit~1.
OK, I have reverted to what I said, basically, and the added test works now. @royclarkson, do you have the means to test by forking/building?
Note: the generated patch for the pair mentioned in this issue is now:
[ { "op": "remove", "path": "/0" }, { "op": "remove", "path": "/1" } ]
which works as intended.
I can do that. I'll pull the code and test later today. Thanks!
the problem is mainly that JSON Patch does not define "diff"
Yeah, exactly. The definition of "valid diff" is basically "produces expected results when fed through the algorithm described in rfc6902".
Thanks for digging into this, and for the super-fast turn-around!
@fge: do you want me to attempt a fix on your refactored version still?
@rwatler if you can spare the time I'd appreciate it! Also, I believe more tests need to be written... I have trouble isolating the "elementary" scenarios to test.
@fge: I have a fix for the refactored code. How do you want to proceed here? I can create a pull request that reestablishes the refactored code in one commit and the fix in another. Is that how you'd like to proceed?
Let me know,
Randy
@rwatler I have just created a branch named refactored
where I revert f176df0, can you apply the fix over this branch?
In the meanwhile I'm going to have a look at what I did wrong by looking at your diff ;)
OK, thanks to @rwatler for spotting the bug in my refactoring.
@royclarkson may I ask some more work from you? There is a refactored
branch which contains the fix over the new code, can you test with that as well?
In the meanwhile I'm going to add more tests...
@royclarkson, @briancavalier: any news?
@fge, sorry for the delay. I've done some simple tests, and the changes appear to be working in the refactored
branch. I put together a basic app for testing. You can see the examples in the readme. Thanks!
@royclarkson OK, thanks a lot! For reporting the bug in the first place and for testing.
Thanks @rwatler for noticing the bug in my refactoring and for the explanation of where I went wrong.
I'll release 1.7 immediately with this bug fix, it is important enough.
@fge I've done a few limited tests as well, and it's been working great so far.
@briancavalier OK, good to hear :)
1.7 has just been pushed to maven; it should be available soon.
@royclarkson I have noticed one thing in your test webapp: you use a plain ObjectMapper
; you should be careful with that since it will not exactly do the right job with numeric instances. By default, floating point values are deserialized as double
s. In fact, this is not much of a problem if the data is serialized with JavaScript in the first place since it can't do better than that, but if the JSON is sent as text, you may have surprises in test
operations; I use DeserializationFeature.USE_BIGDECIMAL_FOR_FLOATS
in JsonPatch.fromJson()
.
Ah, I see. Thanks for the warning. I'm actually not using that ObjectMapper in the test app. I added it before I realized I didn't need it. But.. internally to Spring MVC, I'm not sure how the ObjectMapper is configured for deserialization. I'll keep that in mind if we encounter any issues.
@royclarkson OK, I was a little too tired. .fromJson()
obviously doesn't do that. But all my mappers are configured using it.
Anyway, can you tell me when you've pulled 1.7 so that the issue can be closed?
I've updated to 1.7 and it's working as expected. Thanks again!
Consider the following source:
And the following target:
JsonDiff produces the following patch:
However, if you apply that patch to the original source, you get an exception:
The issue appears to be that deletes are applied sequentially. The path at index 0 is deleted, but then the resulting array only has length 2. Operating on the shortened array means there is now no path at index 2 to delete. For example the following patch will not throw an exception, however the first and THIRD nodes are deleted, not the first two as might be expected.