Open b5 opened 4 years ago
We've been relying on contiguous array representation. This isn't possible with objects, which don't have a defined total order. It's also a problem if we'd ever like to skip indices in array diff output, which I think we do.
Let's diff two objects:
djs.json
{
"dj dj booth": { "rating": 1, "uses_soundcloud": true },
"dj wipeout": { "rating": 2, "uses_soundcloud": true },
"com truise": { "rating": 4, "uses_soundcloud": true },
"Ryan Hemsworth": { "rating": 4, "uses_soundcloud": true }
}
djs_edits.json
{
"DJ dj booth": { "rating": 1, "uses_soundcloud": true },
"DJ wipeout": { "rating": 2, "uses_soundcloud": true },
"com truise": { "rating": 4, "uses_soundcloud": true },
"Ryan Hemsworth": { "rating": 4, "uses_soundcloud": true },
"Susan Collins": { "rating": 3, "uses_soundcloud": false },
}
here's a first crack at a hand-constructed diff:
{
"header": {
"-": "djs.json",
"+": "djs_edits.json",
"stat": { "..." : null }
},
"script": [
[["-","com truise", "Ryan Hemsworth"],["+","com truise", "Susan Collins"]],
[" ","com truise", { "rating": 4, "uses_soundcloud": true }],
["-", "dj dj Booth", { "rating": 1, "uses_soundcloud": true }],
["+", "DJ dj Booth", { "rating": 1, "uses_soundcloud": true }],
["-", "dj wipeout", { "rating": 2, "uses_soundcloud": true }],
["+", "DJ wipeout", { "rating": 2, "uses_soundcloud": true }],
[" ", "Ryan Hemsworth", { "rating": 4, "uses_soundcloud": true }],
["+", "Susan Collins", { "rating": 3, "uses_soundcloud": false }]
]
}
To do object patching properly, it's easiest if we add another entry to the change tuple that specifies the key. In array changes we can either list a numeric or string index
Update on this that came from implementing qri-io/deepdiff#6:
The structure of a diff tuple is:
tuple = [ change_type, key, value, []tuple ]
key
element in the tuple is definitely needed, and can be one of three types: null
, string
, and number
. strings are still used as object keys, numbers are used for indexing into array indices, and null
is used for referencing the root value itself.value
element defines the current element. If value
is non-null
, the tuple should be 3 elements long (no []tuple
section)[]tuple
element, the value
element is ignored. a 4-element tuple indicates we're dropping into a compound type to describe changes@@
part of a git diff)This new output format is now in qri diff
qri diff --format json a.csv b.csv | jq
We need to think about how to format & display diffs. We have two main display formats:
Git Combined Diff
Before we get to structure data diffs, I think it makes sense to start with git diffs, and build upward from there.
To get a quick example, I started with one of our recent commits by adding
.patch
to the end of a github URL: https://github.com/qri-io/qri/commit/98f065115abec6a2e8eae5e682d5a5c4a70d1562.patchThis is the result:
The output is both human readable an machine parsable, and forms the basis for a lot of diff-representation UI.
The git documentation is the best place to start for understanding the format. git's combined diff format
The git diff format itself is the source of the three lines of contextual padding.
Constructing an example manually
Looking at the above, I've constructed a patch by hand to use as an example. It removes three lines, adds one, and has the three lines of context:
To build a react component out of this, I think it's easiest to build a JSON data structure representation of this unix format. A first stab:
the patch structure is a tuple that mimics the text output of the git combined diff, using a space to indicate no change,
+
for add and-
for remove. If the first entry is an array type, it's an equivelat to@@
in git diff. The[ [1,8], [1,5] ]
are line change ranges. The second element in the tuple is always the text in question.cc @dustmop