Performing a word diff over a full file can be fairly slow on large files.
A better approach is to perform a line diff first and and then perform the word diff on the found changes.
While this is already possible with imara-diff is requires quite a bit of legwork and can be tricky to get right.
It would be nice if this could be included in the library directly.
This has multiple steps for an implementation:
Determine the output format. A different trait or force collecting into a Vec?
Implement a TokenSource for words
Implement a Sink that automatically computes a word diff
Potentially implement a heuristic to detect and ignore
The diff algorithm in git only operates on lines. It is worth looking into what exactly they use to produce a colored word diff from the line diff.
Perhaps a different algorithm is a better fit?
Performing a word diff over a full file can be fairly slow on large files. A better approach is to perform a line diff first and and then perform the word diff on the found changes. While this is already possible with
imara-diff
is requires quite a bit of legwork and can be tricky to get right. It would be nice if this could be included in the library directly. This has multiple steps for an implementation:Vec
?TokenSource
for wordsSink
that automatically computes a word diffThe diff algorithm in git only operates on lines. It is worth looking into what exactly they use to produce a colored word diff from the line diff. Perhaps a different algorithm is a better fit?