Open netllama opened 4 years ago
Can you provide a smaller example with current and expected results?
There are some limitations to list comparison:
Thanks. The limitations that you noted are definitely present in the data that I'm working with. It that's expected behavior right now, that's fine. I'll need to use something other than dictdiffer for the project.
I am confused by the statement: ”it can not detect changes in the order of elements”.
Since:
list(dictdiffer.diff([1,2],[2,1])) [('change', [0], (1, 2)), ('change', [1], (2, 1))]
Can you expand a bit?
Now I explained it to myself. The ideal result, at least for some use cases, would be some kind of a ”swap” operation, instead of the two ”changes”.
Yes, exactly. Sometimes, the order of dicts inside the list doesn't matter.
You should be able to reproduce my original issue by creating a second list of dicts (based on the one that I provided), and making a few minor changes:
prefix
key. Then diff the original and new lists. It should only report the addition of 1 new dict, but in my results, it reports multiple other changes erroneously.I think your first issue is covered by @jirikuncar’s explanation of list diffs - everything is ”different” from the addition onward. Your second issue I think stems from the fact that dictdiffer does not compare dicts by identity, but by content - if you have shuffled the list, changing the first element, dictdiffer detects the difference in specific content within the dict, not in its identity.
If you find a better solution for your needs, please share your findings here as well.
I have been experimenting with storing JSON structures in Git to do diffs on big nested objects.
You could do something like:
$ pip install git_json_tree
$ ipython
import subprocess
import git_json_tree
repo = git_json_tree.Repo('/tmp/git_json_tree')
sha_original = git_json_tree.encode(repo, ["your example goes here"])
sha_modified = git_json_tree.encode(repo, ["your modified example goes here"])
subprocess.call(['git', 'diff', sha_original, sha_modified])
@jirikuncar I'm running into a similar issue with comparing both lists and sets... if I have two lists A,B
list_a = ["changeA", "changeB"]
list_b = ["changeA", "changeC", "changeB"]
Dictdiff correctly finds the ADD
"changeC", but is also reporting a change in index 1 from changeB to changeC. Is there a way to ignore this?
I tried converting the list into sets (since they are unordered and unindexed):
set1 = set(["changeA", "changeB"])
set2 = set(["changeA", "changeC", "changeB"])
But dictdiff is still reporting a CHANGE
as
<class 'list'>: [
('add', '', [(0, {'changeC'})]),
('change', '', ({'changeA', 'changeB'}, {'changeA', 'changeC', 'changeB'}))
]
@danielduhh I have tried with lastest dictdiffer
version.
In [1]: import dictdiffer
In [2]: list_a = ["changeA", "changeB"]
...: list_b = ["changeA", "changeC", "changeB"]
In [3]: set1 = set(["changeA", "changeB"])
...: set2 = set(["changeA", "changeC", "changeB"])
In [4]: list(dictdiffer.diff(list_a, list_b))
Out[4]: [('change', [1], ('changeB', 'changeC')), ('add', '', [(2, 'changeB')])]
In [5]: list(dictdiffer.diff(set1, set2))
Out[5]:
[('add', '', [(0, {'changeC'})]),
('change', '', ({'changeA', 'changeB'}, {'changeA', 'changeB', 'changeC'}))]
I can confirm that the result for sets looks wrong. Can you submit a PR with test case and expected results?
@danielduhh can you check if https://github.com/inveniosoftware/dictdiffer/pull/133 is solving your problem?
FYI, this will be useful for those of us wishing to implement Strategic Three Way Merge Patch for the Kubernetes API :) for things like deployments, where you can have big lists of environment variables that look like so:
{
"name": "MEDIA_URL",
"valueFrom": {
"configMapKeyRef": {
"key": "MEDIA_URL",
"name": "my-secret"
}
}
}
AFAIK the kubectl
cli tool uses a schema that has pre-specified "patch merge keys" to essentially treat each element of the list like a value that is keyed by a specified property within (in the case above, it'd be "name"
)
as of dictdiffer 0.8.1, I'm still getting the wrong results for lists of dicts like this. I don't think you could solve this in a truly generalized way, but allowing schema-based diff could be a good approach.
@Datamance basically you would like to provide a custom diff
function for certain keys?
That could work, and I think you could use that as a building block for the common case of schema-based (particularly JSONSchema) diffing/patching.
So basically one shoud provide mapping from dotted_node
to a function with a same signature as diff
:
def diff(..., key=None):
# after dotted_node ~ :158
if dotted_node in key:
for item in key[dotted_node](first, second, node=node, ignore=ignore, path_limit=path_limit, expand=expand, tolerance=tolerance, dot_notation=dot_notation):
yield item
return
Any updates on this?
@aorumbayev feel free to send a PR with a fix discussed above.
I'm trying to use dictdiffer to get the diff of two lists of dictionaries, but its not working well. It seems to get confused whenever the ordering of the dicts inside the list is not identical between the two lists, or when the number of dictionaries in one list is (slightly) different from the other list.
Is this functionality something that is currently possible and expected to work, or am I trying to accomplish something unsupported?
In case it matters, this is an example of a list of dictionaries that I'm trying to work with: