trailofbits / graphtage

A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.
GNU Lesser General Public License v3.0
2.37k stars 45 forks source link

Best approach to use graphtage to create a "diff" of an already existing tree? #75

Open SamWilsn opened 1 year ago

SamWilsn commented 1 year ago

Hey! Sorry for the broad scope of this question, but I'm having some trouble wrapping my head around how to use this library to generate a "diff" or "patch" for an already existing tree structure.

Background

We're writing a specification, and want to highlight changes between versions of the specification. Currently we use sphinx and a bunch of custom tooling to render the documents. You can see what the current documentation looks like here.

We've hit a few walls with this approach and have decided to move to a custom solution. You can see what that looks like over here.

The new tooling, docc, represents each document as a tree of nodes. This is what we want to calculate a diff between. For example, this is how we represent Python types.

Goal

I'd like to take two document trees, perform some magic with graphtage, and get out a third tree representing what's changed. Here's a really simple example:

Before After Output
```mermaid graph TD; A-->B A-->C ``` ```mermaid graph TD; A-->B A-->D ``` ```mermaid graph TD; A-->B A-->Diff Diff-->|removed| C Diff-->|added| D ```

Issues

I've tried the Diffing In-Memory Python Objects approach, but received a utf-8 decoding error.

I've tried subclassing LeafNode and ContainerNode, but I'm not familiar enough with graphtage to implement them correctly.

Current Approach

My current approach is to map my tree objects into FixedKeyDictNode instances, then back into my tree objects. Now I'm stuck figuring out how to convert the EditedTreeNode instances into the Diff(before, after) style node from the above example. What would you suggest?