Improve the application of tries to diffing (plus some other ideas)

This fixes #2.

There are a bunch of lessons here, some expected and others surprising:

avoiding indexing the shared prefix/suffix by adding in: Range to trie init is a minor (but free!) win
adding against: [knownRemoves|knownInserts] to trie init pays for itself twice:
1. less work building the trie
2. trie returns faster from lookups when the needle includes obvious elements
combined with the other optimizations in Club, tries are not very useful when the alphabet is relatively small. This is most obvious in e.g. testShuffledUUID vs testRandomBitBuffers
- the current heuristics for deploying tries use alphabet.count > diffRange.count / 2, which is pretty good but probably has room for improvement
limiting the width of the work queue is extremely effective and affects diff quality less negatively than n-gram tries
despite this, tries provide further improvement for tests like testReversed and testBtree79ce96ab39To93ba69ec97ByLine, but only if the diff is restarted after the trie is created. the introduction of the trie to the diff loop limits the proliferation of edit paths, but it doesn't tend to reduce them until a sufficiently long snake is found to outpace them in the work queue, which never happens when diffing a collection against its reverse
_Alphabet tracks element offsets again, and supports lookup via .offset(of:,after:). this makes possible some less lossy variants of the Arrow diff optimizations from NSOrderedSet:
- if the edit path has passed the last instance of one element in the other collection, only one edit is emitted for that round
- if both elements exist ahead in the other collection, but one is much further away than the other, the nearest match is pursued
  - the heuristics for "much further away" could probably use some tuning
changing var tmp = f[e, default: 0]; f[e] = tmp + 1 → f[e, default: 0] += 1 significantly reduces dictionary lookups, and a variant of this that abuses a function with an inout parameter allows the trick to be used when constructing a trie.
- strangely, the improvement is not always 50% and I have no idea why

Here's a visualization of how the workqueue size differs for each round of the diffing algorithm for the first hundred rounds of a couple of the tests in the suite when tries are disabled. I used it as a starting point for experimenting with on-demand trie creation.

There's a couple neat things here:

testReversed, testShuffledUUID, testOrderedSetPromotable@n=500, and testOrderedSetPromotable@n=100 are the most obvious beneficiaries of trie lookups, being degenerate inputs for Myers' diff
testOrderedSetPromotable@n=50 clearly shows the work queue getting smaller as the algorithm passes the half-way point of the edit graph
testLoremIpsums, testMuspiMerol, and testRandomLetterStrings@500 are also clustered together. empirically, limiting the workqueue size was far more effective at producing a faster (and smaller!) diff than applying n-gram trie lookup to these tests.
- throughout this process, testLoremIpsums has been the hardest test to get good results from
tests in the bottom cluster (like testBinaryByPercentageChanged@95%) show how effective the workqueue is at dropping paths that aren't keeping up every time a sufficiently long snake is encountered
testBtree79ce96ab39Tof55ea8f456ByLinePerf is a real world test between btree.c@79ce96ab39 and btree.c@f55ea8f456, having a long prologue of minor/sparse edits followed by a dense cluster of changes that quickly proliferates the size of the work queue (slope > 1!). Trie lookup is not faster than limiting the workqueue size but both approaches run much faster than Myers'.

numist / Diffing-Explorations

Improve the application of tries to diffing (plus some other ideas) #4