Open sageserpent-open opened 1 month ago
Suppose we recursed on swathes purely to discover subproblems without actually building up partial results?
That might allow priming of a dynamic programming algorithm which then works bottom-up, but only traversing the subproblems that are actually needed by the top-level problem. I think we can infer the swathes in reverse by looking at the tuple indices, trying to group tuples that look like alternate forks.
Not easy, though.
Given that sometimes the top-down opportunistic approach has to evaluate a large fraction of the total number of subproblems that the dynamic programming algorithm has to evaluate, is it likely that the overhead of the more complex implementation will yield benefits as opposed to losses in many situations?
Marking this as wontfix
for now, let's see how we get on with straightforward dynamic programming.
This is a follow-on to #93.
There was a performance showdown between evaluating the longest common subsequence using a bottom-up, exhaustive dynamic programming approach and a top-down, opportunistic recursive approach using memoisation.
Taking the constraints of avoiding stack overflow and keeping heap usage down, the dynamic programming approach won decisively.
The vanquished top-down recursive code can be found via the tag
issue-93-top-down-recursive-lcs
.On thing that was not considered at the time was inspired by the use of computing Fibonacci numbers as a toy model for how to structure the LCS implementation. Both problems allow bottom-up dynamic programming and top-down recursion with memoisation, although the Fibonacci number generation does not afford any opportunistic optimisation - all of the subproblems have to be evaluated.
It is well-know that Fibonacci numbers don't actually require a recursive evaluation over shared subproblems, even though their definition naturally leads to this; instead, one can compute pairs of adjacent Fibonacci numbers, using simple recursion that doesn't branch over shared subproblems (in fact it doesn't branch at all - each subproblem directly depends on just one simpler subproblem, or is the terminating case of the recursion). If needs be, the recursion can also be made tail-recursive by working up the subproblem input sizes.
A sketchy insight occurred to me regarding the input tuples of sequence lengths across the sides, where a tuple is the input to (and essentially also a label of) a corresponding subproblem. If we partition the subproblems by the maximum sequence length in any given tuple, we end up with sets of tuples that are swathes across the subproblem space - any swathe for a given maximum sequence length does not depend on swathes that have a higher maximum sequence length, and indeed only depends directly on the swathe with whose maximum sequence length is one less than its own (or it is the terminating swathe).
What's going on here is that any tuple in a swathe is computed from subproblems that either belong in the same swathe or the next one along, going down in maximum sequence size. The size of a swathe has a loose upper bound of
1 + BaseSequenceSize * leftSequenceSize + BaseSequenceSize * RightSequenceSize + LeftSequenceSize * RightSequenceSize
. Actual swathe sizes can be lower.So the top-down recursion can be recast as a simpler no-branching scheme that works in swathes; there is only one recursion step from one swathe to the next. What is crucial is to avoid having each subproblem in a swathe waiting in suspense for the subproblems in the following swathe to complete prior to finishing execution. If we did allow that to happen, we end up with lots of subproblems holding space on the heap - we want to be able to work swathe-by-swathe to allow subproblems to be purged and thus not exhaust the heap.
What we could do is to work in tail-recursive fashion down the swathes, starting with the longest sequence length for the initial swathe and recursing down to the zero-length swathe to terminate the recursion.
For any tuple in a swathe, we can't just evaluate the tuple subproblems it depends on, because at least one of them belongs to the following swathe. So we maintain a partial result in reverse order, passing it down through each subproblem and prepending the element chosen by a fork to each subproblem to that partial result. The call to the subproblem with its modified partial result is either executed there and then if it belongs to the same swathe, or is parked in a continuation or something similar that is made part of the following swathe.
Working tail-recursively means that we can't choose the maximal subproblem amongst the forks as recursion unwinds, so instead we harvest all the partial results in the terminating swathe and choose the longest.
As we throw old swathes away when tail-recursing, I hope this keeps heap usage down. It does mean that the structure sharing between the partial solutions duplicates the tree of subproblems explored though - perhaps this is still too much demand on the heap.
At this point, I'm not so sure this is a good idea - it's complicated enough as it is. Possibly some kind of off-heap cache could be used to store either the partial solution tree (or even the swathes if we allow the swathes to recreate the unwinding avoided by the tail-recursion, assuming we to build partial solution up from the bottom).
Food for thought...