svenssonaxel / diff-format

Specification, reference implementations and test battery for Hintful diff format.
Other
10 stars 1 forks source link

Hintful diff format

Specification, reference implementations and test battery for Hintful diff format.

Why?

Today's diff formats aren't good enough. They lack:

Hintful diff format aims to address those issues in a both forward and backward compatible way, and with a minimal complexity increase.

Support for non-line-based content

The following is a unified diff of a very simple code change:

A highlighted, unified diff of a simple change to Python code

In order to understand it you have to mentally compare the two versions, essentially running a diff algorithm in your head. Many tools that view diff files help you out by using an internal diff algorithm to "refine" it. Either way, you need a diff algorithm to make sense of the output from a diff algorithm. That is pretty bad. Here is a hintful diff format file of the same change:

A highlighted hintful diff of a simple change to Python code

Here you can see the changes at a glance, even though you lose sense of the horizontal code layout. At least you don't have to do any mental diffing. Now, take a look at how it's visualized:

A visualization of a hintful diff of a simple change to Python code

This is understood at a glance. It is also a dumb visualization, much simpler to implement than "diff refinement".

The above is meant as a good example of how current diff formats aren't good enough for regular activities, but support for non-line-based content enables new use cases such as diffing minified code. A unified diff would look ridiculous:

A highlighted unified diff of a simple change to minified Javascript code

But the same thing in a hintful visualization is quite readable:

A visualization of a hintful diff of a simple change to minified Javascript code

See the specification for details.

Enabling syntax-aware, composable tooling

There seems to have been many attempts over recent years and maybe decades to improve diff tooling to be more readable for humans, often by making them syntax-aware.^difftastic^prettydiff These tools attempt to cover the whole pipeline: Compare two versions of a tree, present the difference to the user in an easy to digest manner and allow changes to be applied/merged. Despite all the nice features, that is not a composable tool. Perhaps lack of composability is one reason why such tools fail at gaining widespread adoption?

Unified diff format has long been at the center of a set of composable tools that produce or consume this format. There is diff and git diff to compare two versions of a tree and produce a diff-format file, there are many UI tools to visualize ("refine"!) diff-format output for better human comprehension, and there is patch and git apply to consume a diff-format file.

However, unified diff format cannot support syntax-aware diff tools because it doesn't allow such tools to describe their findings. Hintful diff format aims to become the diff format at the center of both traditional and such advanced tooling, by being 1) forward and backward compatible with unified diff format. 2) syntax-agnostic. 3) usable for patching, reverse patching and merging. 4) effortlessly inspectable by humans. 5) suitable to express changes with a finer granularity than one line of code. 6) expressive enough for common code change operations such as renaming, moving and refactoring. 7) simple to understand.

This requires a new diff format, since

Support for renaming, moving and refactoring

The git version of unified diff format already supports renaming files, but no code movement of finer granularity can be expressed.

Hintful diff format has a feature called named snippets, which is a way to connect arbitrary sections of code on either side of the comparison. It is a simple but powerful feature that can be used to author any kind of variable renaming, code movement or refactoring. A simple example follows:

A visualization of a hintful diff of a simple refactoring

There are two named snippets here. Snippet E connects code added at one place with code removed from another place, effectively expressing a code movement. Snippet N connects three equal additions, thereby stating explicitly that they are the same thing.

See the specification for details.

How to help

You can help by opening and discussing issues about

Roadmap

Step 1: Reach beta quality

This is where we are currently

Critique hintful diff format as soon as possible so that it doesn't get finalized with an inherent problem. If you suspect a potential problem, open an issue even if you're not sure or don't have a solution.

Step 2: Tool implementation trial and finalization

After the beta is released, implement the format in your project in a non-production branch/release. We'd like a couple of different producers and consumers to try it out and reveal as many remaining problems as possible. This experience is used to finalize the format.

Step 3: Tool implementation

After the format is finalized, implement it in your project and feel free to release it.

For producers such as diff, git diff and semantic diff tools you could for example

For consumers such as patch, git apply and diff visualizers, you could for example

A plausible road to widespread adoption looks something like this: