open-contracting / ocdskit

A suite of command-line tools for working with OCDS data
https://ocdskit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
17 stars 6 forks source link

Library method to "diff" releases #160

Open jpmckinney opened 3 years ago

jpmckinney commented 3 years ago

https://pypi.org/project/deepdiff/ is a great library for comparing generic dictionaries. However, if we want to diff two "full" releases in order to calculate a minimal release to publish as part of a release history, we need specialized code that is aware of the merging routine. For example:

This feature would be relevant to:

Requested by @dwasyl

dwasyl commented 3 years ago

Just for some added context, this would be especially helpful because of the system at OpenNWT can only generate a 'full' release. The full releases we generate don't account for a few things - mostly deletions from lists and field deletions.

A tool to help create minimal diff releases would save some space, but also create better releases that would merge properly.

Specifically, there is some difference between making a diff release between two releases or between two compiledReleases (compiledReleases being more absolute so more assumptions could be made). Either or both would be helpful in my use case.

jpmckinney commented 3 years ago

@dwasyl If a field is missing in the second release, should the diff set that field to null, or just not mention it? Similarly, if an object in an array is missing, should it set all its fields except id to null?

dwasyl commented 3 years ago

@jpmckinney I was thinking about this after we spoke. For my purposes, if it's missing then it should be null since otherwise the system would have included it in the release. This assumes the two releases being diffed are explicit (so are essentially equivalent to compiledReleases).

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

jpmckinney commented 3 years ago

For a more generic tool, it might be a desirable configurable option? Missing fields are either ignored or set to null.

Yes, I was thinking the same :)

dwasyl commented 3 years ago

That way it'd work even for publishers who are only able to put out a single release at any given time. If someone had saved those along the way, they could develop a mini release history if they needed to for some reason (which is essentially what OpenNWT does, scrapes point in time measures).