Open numist opened 4 years ago
Initial results from testBtree79ce96ab39Tof55ea8f456ByLine
:
club
┌───────────────────────────────────────────────────────────┐
│ Legend: │
│ │
│ 'n=' -- size of the collections being diffed │
│'|Δ|' -- count of individual changes in the diffs │
│ '==' -- count of element equality evaluations │
│ '⌛︎' -- wall clock time (when both algorithms exceed 10ms)│
└───────────────────────────────────────────────────────────┘
================================= hybrid/myers =================================
n=(10518,9878): |Δ|:2238/1832 (+22.2%) ==:160603/1690406 (9.5%) ⌛︎:0.25/0.95 (26.0%)
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
n=(10518,9878): |Δ|:1832/1832 (+0.0%) ==:323535/1690406 (19.1%) ⌛︎:0.25/0.95 (26.6%)
================================================================================
Unfortunately I don't really understand the heuristics for prudently deploying a divide and conquer approach (it would produce an inferior diff for an input like 0000001234
→ 1234000000
) and there are a lot of regressions (even comparing to myers
) in the PRNG test testByPercentageChangedDoubledUUID
.
I also suspect that a lot of the benefit promised by divide-and-conquer is already realized by club
in its eager match-seeking behaviour.
Pushed impatience-diff
.
Now that arrow diff is back, it should be totally possible to perform a diff that divides the input into smaller chunks and recurses:
At a high level, this is basically what patience diff does, but using arrow (ordered set diffing) instead of myers for computing the pivots (hence: impatience) and recursing instead of just accepting that the entirety of the subranges as different.